Skip to content

[feat](snapshot) Support storage vault for create/list snapshot#62523

Draft
wyxxxcat wants to merge 1 commit intoapache:masterfrom
wyxxxcat:snapshot-storage-vault/cm_vault
Draft

[feat](snapshot) Support storage vault for create/list snapshot#62523
wyxxxcat wants to merge 1 commit intoapache:masterfrom
wyxxxcat:snapshot-storage-vault/cm_vault

Conversation

@wyxxxcat
Copy link
Copy Markdown
Collaborator

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@wyxxxcat
Copy link
Copy Markdown
Collaborator Author

run feut

@wyxxxcat wyxxxcat force-pushed the snapshot-storage-vault/cm_vault branch from 7c0435e to aa4b30c Compare April 16, 2026 02:08
@wyxxxcat
Copy link
Copy Markdown
Collaborator Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 2 blocking issues.

  1. CloudSnapshotHandler.submitJob changes the cloud_snapshot_handler_class extension-point signature from (long, String) to (long, String, String). Because the handler is loaded reflectively from config, existing custom handlers compiled against the old API will no longer override the invoked method, and ADMIN CREATE CLUSTER SNAPSHOT will fall back to the base NotImplementedException after upgrade.
  2. ClusterOptions.enableStorageVault is currently a no-op in the docker regression framework. The new code only injects ENABLE_STORAGE_VAULT=1, but nothing reads that variable, and non-SQL-mode cloud startup still creates the instance via create_doris_instance() with obj_info, which keeps enable_storage_vault=false.

Critical checkpoints:

  • Goal of this PR: partially met. The schema/parser surface changes are present, but the reviewed runtime paths do not safely deliver end-to-end storage-vault support.
  • Scope/focus: small and focused, but incomplete because the new flag is ineffective and the handler API change is incompatible.
  • Concurrency/locking: no new concurrency risks in the touched code.
  • Lifecycle / extension points: broken. The reflective cloud_snapshot_handler_class hook is not backward compatible.
  • Configuration changes: no new config items.
  • Compatibility / rolling upgrade: not preserved because the handler method signature changed.
  • Parallel code paths: not handled consistently. The regression docker startup has SQL-mode and non-SQL-mode flows, but enableStorageVault is only passed as an unused env var.
  • Special checks / validation: vault_name validation itself is fine.
  • Test coverage: insufficient. Added tests only cover local validation/schema filling; there is no end-to-end test proving storage-vault snapshot create/list behavior or the new docker option.
  • Test result changes: none to review.
  • Observability: no additional issue found in the touched code.
  • Transaction / persistence: not applicable in the reviewed diff.
  • Data-write atomicity / crash safety: not applicable in the reviewed diff.
  • FE-BE / protocol propagation: new proto fields were added, but the runtime paths above are not proven end-to-end by the current implementation/tests.
  • Performance: no material issue found in the touched code.

Please address the compatibility break and wire the regression-framework option through a real startup path before merging.

@wyxxxcat wyxxxcat force-pushed the snapshot-storage-vault/cm_vault branch from aa4b30c to f46a758 Compare April 16, 2026 02:54
@wyxxxcat
Copy link
Copy Markdown
Collaborator Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings:

  1. docker/runtime/doris-compose/cluster.py: this patch removes import uuid, but MS.docker_env() still uses uuid.uuid4() to build DORIS_CLOUD_PREFIX. Starting a cloud cluster now fails with NameError before the meta-service container comes up.
  2. fe/fe-core/src/main/java/org/apache/doris/cloud/snapshot/CloudSnapshotHandler.java: changing submitJob(long, String) to submitJob(long, String, String) breaks existing custom handlers loaded via cloud_snapshot_handler_class. Those handlers will still instantiate, but ADMIN CREATE CLUSTER SNAPSHOT now dispatches to the new base overload and hits NotImplementedException unless every implementation is rebuilt and updated.
  3. docker/runtime/doris-compose/resource/common.sh: the new storage-vault branch creates the instance in vault mode, but it still leaves default_storage_vault_name empty on the normal startup path. SuiteCluster defaults sqlModeNodeMgr to false, so nothing later creates/sets a default vault there; ordinary DDL will still fail with No default storage vault unless every table specifies storage_vault_name explicitly.
  4. fe/fe-core/src/main/java/org/apache/doris/catalog/SchemaTable.java: exposing VAULT_ID unconditionally is not mixed-version safe. SchemaScanOperatorX::prepare() requires every FE slot to exist in the BE scanner, so a new FE querying information_schema.cluster_snapshots against an old BE will fail with no match column for this column(VAULT_ID) during rolling upgrade.

Critical checkpoint conclusions:

  • Goal of the task: Partially accomplished. The parser/schema pieces were updated, but the create/list snapshot storage-vault flow is not safely completed end to end because of the handler compatibility break and the incomplete docker/test path. No end-to-end test demonstrates create/list snapshot working in storage-vault mode.
  • Scope/minimality: The change set is focused on snapshot/vault plumbing, but the docker-compose changes introduce unrelated regressions in cluster startup and test-cluster initialization.
  • Concurrency: No new concurrency-specific issue was identified in the changed code.
  • Lifecycle / extension points: CloudSnapshotHandler is instantiated via reflection from config, so its method signature is effectively part of a public extension point. The new signature is not backward compatible.
  • Configuration: No new config item is added.
  • Compatibility: There are blocking compatibility issues for both configured snapshot handlers and FE/BE mixed-version schema scans.
  • Parallel code paths: The SQL node-manager path creates a default vault, but the normal startup path does not. The new behavior is inconsistent across those two paths.
  • Special conditions / checks: The new ENABLE_STORAGE_VAULT branch is insufficient by itself because it does not establish a default vault for the common regression-test flow.
  • Test coverage: Insufficient for the new behavior. The added tests only cover parser/schema scanner updates; nothing exercises docker startup, non-SQL-mode vault clusters, configured custom handlers, or mixed-version upgrade behavior.
  • Test result files: Not applicable.
  • Observability: No major new observability gap was introduced beyond the functional regressions above.
  • Transaction / persistence: No new transaction- or edit-log-specific defect was identified in the touched code.
  • Data write / modification paths: The changed test-cluster initialization path now provisions vault-mode instances that still cannot serve ordinary default-vault DDL.
  • New variables across components: VAULT_ID / resource_id was added to the FE/BE schema-scan contract, but rollout compatibility was not handled.
  • Performance: No material performance issue was identified in the touched code.
  • Other issues: None beyond the findings above.

Overall opinion: request changes.

import jsonpickle
import os
import os.path
import uuid
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing import uuid here breaks the unchanged uuid.uuid4() call in MS.docker_env() later in this file. Starting a cloud cluster now raises NameError: name 'uuid' is not defined before the meta-service container can come up.

Comment thread docker/runtime/doris-compose/resource/common.sh
Comment thread fe/fe-core/src/main/java/org/apache/doris/catalog/SchemaTable.java
@wyxxxcat wyxxxcat force-pushed the snapshot-storage-vault/cm_vault branch from f46a758 to 72318d0 Compare April 16, 2026 03:51
@wyxxxcat wyxxxcat force-pushed the snapshot-storage-vault/cm_vault branch from 72318d0 to a7ab688 Compare April 16, 2026 03:51
@wyxxxcat
Copy link
Copy Markdown
Collaborator Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 57.14% (4/7) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 28.57% (2/7) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (12/12) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.76% (27444/37205)
Line Coverage 57.43% (296696/516660)
Region Coverage 54.66% (247248/452322)
Branch Coverage 56.32% (107162/190267)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants