[enhance](cloud) proactively sync tablet meta after alter#61585
[enhance](cloud) proactively sync tablet meta after alter#61585luwei16 wants to merge 7 commits intoapache:masterfrom
Conversation
FE now sends a sync_tablet_meta RPC to all alive cloud backends after alter updates tablet meta in meta service. The request carries affected tablet ids and is dispatched as a best-effort notification, so alter success still depends on meta service update instead of backend acknowledgements. BE handles the RPC by refreshing meta only for tablets that are already cached locally. Uncached tablets are skipped, which avoids polluting tablet cache while still fixing stale compaction policy and related tablet meta on active compute clusters. The RPC also returns synced/skipped/failed counts and exposes bvar counters for observability. This change adds FE and BE unit tests and a cloud regression suite. The regression covers cached and uncached multi-cluster behavior, the negative path with proactive notify disabled, and the version-limit scenario where a size_based table hits too many versions, is altered to time_series, and can accept new writes immediately after alter.
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 26984 ms |
TPC-DS: Total hot run time: 167520 ms |
### What problem does this PR solve? Issue Number: None Related PR: apache#61585 Problem Summary: Newly added cloud tablet sync_meta tests expect tablet_schema.is_in_memory to be refreshed together with other tablet meta fields, but sync_meta only updated compaction and ttl related properties. ### Release note None ### Check List (For Author) - Test: Partial verification only - BE unit test build started with ./run-be-ut.sh --run --filter=CloudInternalServiceTest.*:CloudTabletMgrTest.TestGetTabletIfCachedOnlyReturnsCachedTablet:CloudTabletSyncMetaTest.* -j8, but full result was not available in this session because first-time worktree dependency compilation (openblas/faiss) was still running - Behavior changed: Yes (cloud tablet meta refresh now also syncs tablet_schema.is_in_memory) - Does this need documentation: No
### What problem does this PR solve? Issue Number: None Related PR: apache#61585 Problem Summary: The previous follow-up fix made CloudTablet::sync_meta refresh tablet_schema.is_in_memory, but this property should not be synchronized in cloud tablet meta refresh. Adjust the BE UT to validate the intended behavior while keeping ttl synchronization covered. ### Release note None ### Check List (For Author) - Test: Partial verification only - Updated the affected BE unit test expectation; full ./run-be-ut.sh verification was still compiling first-time worktree dependencies in this session - Behavior changed: Yes (sync_meta no longer updates tablet_schema.is_in_memory; UT now asserts that behavior) - Does this need documentation: No
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 26803 ms |
TPC-DS: Total hot run time: 169435 ms |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 26750 ms |
TPC-DS: Total hot run time: 169803 ms |
|
run beut |
1 similar comment
|
run beut |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 29678 ms |
TPC-DS: Total hot run time: 180426 ms |
|
run p0 |
|
run beut |
|
run buildall |
FE Regression Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by anyone and no changes requested. |
| int64_t synced = 0; | ||
| int64_t skipped = 0; | ||
| int64_t failed = 0; | ||
| g_cloud_sync_tablet_meta_requests_total << 1; |
There was a problem hiding this comment.
LOG the request and response
tablet list and time taken
| return; | ||
| } | ||
|
|
||
| InternalService.PSyncTabletMetaRequest request = InternalService.PSyncTabletMetaRequest.newBuilder() |
There was a problem hiding this comment.
log the request
table and tablet list
and time taken
FE now sends a sync_tablet_meta RPC to all alive cloud backends after alter updates tablet meta in meta service. The request carries affected tablet ids and is dispatched as a best-effort notification, so alter success still depends on meta service update instead of backend acknowledgements.
BE handles the RPC by refreshing meta only for tablets that are already cached locally. Uncached tablets are skipped, which avoids polluting tablet cache while still fixing stale compaction policy and related tablet meta on active compute clusters. The RPC also returns synced/skipped/failed counts and exposes bvar counters for observability.
This change adds FE and BE unit tests and a cloud regression suite. The regression covers cached and uncached multi-cluster behavior, the negative path with proactive notify disabled, and the version-limit scenario where a size_based table hits too many versions, is altered to time_series, and can accept new writes immediately after alter.