[feat](snapshot) Support storage vault for create/list snapshot#62523
[feat](snapshot) Support storage vault for create/list snapshot#62523wyxxxcat wants to merge 1 commit intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run feut |
7c0435e to
aa4b30c
Compare
|
/review |
There was a problem hiding this comment.
Found 2 blocking issues.
CloudSnapshotHandler.submitJobchanges thecloud_snapshot_handler_classextension-point signature from(long, String)to(long, String, String). Because the handler is loaded reflectively from config, existing custom handlers compiled against the old API will no longer override the invoked method, andADMIN CREATE CLUSTER SNAPSHOTwill fall back to the baseNotImplementedExceptionafter upgrade.ClusterOptions.enableStorageVaultis currently a no-op in the docker regression framework. The new code only injectsENABLE_STORAGE_VAULT=1, but nothing reads that variable, and non-SQL-mode cloud startup still creates the instance viacreate_doris_instance()withobj_info, which keepsenable_storage_vault=false.
Critical checkpoints:
- Goal of this PR: partially met. The schema/parser surface changes are present, but the reviewed runtime paths do not safely deliver end-to-end storage-vault support.
- Scope/focus: small and focused, but incomplete because the new flag is ineffective and the handler API change is incompatible.
- Concurrency/locking: no new concurrency risks in the touched code.
- Lifecycle / extension points: broken. The reflective
cloud_snapshot_handler_classhook is not backward compatible. - Configuration changes: no new config items.
- Compatibility / rolling upgrade: not preserved because the handler method signature changed.
- Parallel code paths: not handled consistently. The regression docker startup has SQL-mode and non-SQL-mode flows, but
enableStorageVaultis only passed as an unused env var. - Special checks / validation:
vault_namevalidation itself is fine. - Test coverage: insufficient. Added tests only cover local validation/schema filling; there is no end-to-end test proving storage-vault snapshot create/list behavior or the new docker option.
- Test result changes: none to review.
- Observability: no additional issue found in the touched code.
- Transaction / persistence: not applicable in the reviewed diff.
- Data-write atomicity / crash safety: not applicable in the reviewed diff.
- FE-BE / protocol propagation: new proto fields were added, but the runtime paths above are not proven end-to-end by the current implementation/tests.
- Performance: no material issue found in the touched code.
Please address the compatibility break and wire the regression-framework option through a real startup path before merging.
aa4b30c to
f46a758
Compare
|
/review |
There was a problem hiding this comment.
Findings:
docker/runtime/doris-compose/cluster.py: this patch removesimport uuid, butMS.docker_env()still usesuuid.uuid4()to buildDORIS_CLOUD_PREFIX. Starting a cloud cluster now fails withNameErrorbefore the meta-service container comes up.fe/fe-core/src/main/java/org/apache/doris/cloud/snapshot/CloudSnapshotHandler.java: changingsubmitJob(long, String)tosubmitJob(long, String, String)breaks existing custom handlers loaded viacloud_snapshot_handler_class. Those handlers will still instantiate, butADMIN CREATE CLUSTER SNAPSHOTnow dispatches to the new base overload and hitsNotImplementedExceptionunless every implementation is rebuilt and updated.docker/runtime/doris-compose/resource/common.sh: the new storage-vault branch creates the instance in vault mode, but it still leavesdefault_storage_vault_nameempty on the normal startup path.SuiteClusterdefaultssqlModeNodeMgrtofalse, so nothing later creates/sets a default vault there; ordinary DDL will still fail withNo default storage vaultunless every table specifiesstorage_vault_nameexplicitly.fe/fe-core/src/main/java/org/apache/doris/catalog/SchemaTable.java: exposingVAULT_IDunconditionally is not mixed-version safe.SchemaScanOperatorX::prepare()requires every FE slot to exist in the BE scanner, so a new FE queryinginformation_schema.cluster_snapshotsagainst an old BE will fail withno match column for this column(VAULT_ID)during rolling upgrade.
Critical checkpoint conclusions:
- Goal of the task: Partially accomplished. The parser/schema pieces were updated, but the create/list snapshot storage-vault flow is not safely completed end to end because of the handler compatibility break and the incomplete docker/test path. No end-to-end test demonstrates create/list snapshot working in storage-vault mode.
- Scope/minimality: The change set is focused on snapshot/vault plumbing, but the docker-compose changes introduce unrelated regressions in cluster startup and test-cluster initialization.
- Concurrency: No new concurrency-specific issue was identified in the changed code.
- Lifecycle / extension points:
CloudSnapshotHandleris instantiated via reflection from config, so its method signature is effectively part of a public extension point. The new signature is not backward compatible. - Configuration: No new config item is added.
- Compatibility: There are blocking compatibility issues for both configured snapshot handlers and FE/BE mixed-version schema scans.
- Parallel code paths: The SQL node-manager path creates a default vault, but the normal startup path does not. The new behavior is inconsistent across those two paths.
- Special conditions / checks: The new
ENABLE_STORAGE_VAULTbranch is insufficient by itself because it does not establish a default vault for the common regression-test flow. - Test coverage: Insufficient for the new behavior. The added tests only cover parser/schema scanner updates; nothing exercises docker startup, non-SQL-mode vault clusters, configured custom handlers, or mixed-version upgrade behavior.
- Test result files: Not applicable.
- Observability: No major new observability gap was introduced beyond the functional regressions above.
- Transaction / persistence: No new transaction- or edit-log-specific defect was identified in the touched code.
- Data write / modification paths: The changed test-cluster initialization path now provisions vault-mode instances that still cannot serve ordinary default-vault DDL.
- New variables across components:
VAULT_ID/resource_idwas added to the FE/BE schema-scan contract, but rollout compatibility was not handled. - Performance: No material performance issue was identified in the touched code.
- Other issues: None beyond the findings above.
Overall opinion: request changes.
| import jsonpickle | ||
| import os | ||
| import os.path | ||
| import uuid |
There was a problem hiding this comment.
Removing import uuid here breaks the unchanged uuid.uuid4() call in MS.docker_env() later in this file. Starting a cloud cluster now raises NameError: name 'uuid' is not defined before the meta-service container can come up.
f46a758 to
72318d0
Compare
72318d0 to
a7ab688
Compare
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)