[PLUGIN-1957] Validate PK Chunking for incremental loads#354
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
faac75b to
5b4cf9e
Compare
db562e2 to
3f613ab
Compare
3f613ab to
d6c2a04
Compare
d88d48d to
fba1f3f
Compare
ca92e80 to
73dacc9
Compare
dfd2083 to
62deabb
Compare
62deabb to
303f515
Compare
Sunish-Dahiya
left a comment
There was a problem hiding this comment.
On a high level change looks fine but can you please test all these scenario
a5cc3dd to
e23a9c7
Compare
a6058fc to
fe54235
Compare
Sunish-Dahiya
left a comment
There was a problem hiding this comment.
Comment regarding unit tests is still not addressed
539a8ca to
cf5c0b7
Compare
Sunish-Dahiya
left a comment
There was a problem hiding this comment.
Please address the comments before merging.
|
Add all the testcases in the SalesforceQueryUtilTest to reuse the existing PowerMockito class-level configuration for mocking query plan REST calls and |
cf5c0b7 to
ccd70b5
Compare
ccd70b5 to
4ce99f0
Compare
| if (enablePKChunk && pkChunkCountCheck) { | ||
| int chunkSize = config.getChunkSize(); | ||
| enablePKChunk = SalesforceSplitUtil.hasRequiredCountForPkChunking( | ||
| query, authenticatorCredentials, chunkSize); |
There was a problem hiding this comment.
this can be inlined, we don't need a new variable
Bug: b/462086841
4ce99f0 to
edd067c
Compare
PLUGIN-1957 Autodetect PK Chunking for incremental loads
What
Adds an autonomous pre-flight check to decide whether to enable PK Chunking in
SalesforceBatchSource.getSplits().config.getChunkSize()) and the query is selective, PK chunking is skipped and executed as a standard un-chunked query to avoid "empty chunk" overhead on small datasets.Why
PK chunking is designed for very large datasets. Enabling it on small datasets (like highly restrictive incremental intervals where 0 or very few records changed) causes dozens of empty chunk boundaries to be spooled and executed, wasting daily Bulk API allocations and adding massive pipeline overhead. This change ensures chunking is only applied when operationally justified by the actual matching record volume.
Changes
SalesforceBatchSource.java: Configured PK chunking threshold dynamically usingchunkSize.SalesforceSplitUtil.java: Added REST Query Plan explain check and removed unused constants.SalesforceQueryUtil.java: Replaced magic HTTP status codes with constants.SalesforceSourceConstants.java: Cleaned up the unused threshold constant.SalesforceQueryUtilTest.java: Added unit tests covering Query Plan and fallback paths.SalesforceBatchSourceETLTest.java: Renamed ETL test case to remove UI naming.