Skip to content

Immutable folder support in DABs#5254

Open
andrewnester wants to merge 20 commits into
mainfrom
demo-immutable
Open

Immutable folder support in DABs#5254
andrewnester wants to merge 20 commits into
mainfrom
demo-immutable

Conversation

@andrewnester

@andrewnester andrewnester commented May 15, 2026

Copy link
Copy Markdown
Contributor

Changes

Added support for deploying bundles to immutable folders in the workspace

Enabled by using

bundle:
  deployment: 
    immutable_folder: true

Why

Tests

Added an acceptance tests

@andrewnester andrewnester requested a review from pietern May 15, 2026 09:56
@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented May 28, 2026

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: 3a0ec41

Run: 28019874707

Env 🟨​KNOWN ✅​pass 🙈​skip Time
🟨​ aws linux 1 216 99 3:18
🟨​ aws windows 1 218 97 2:45
🟨​ aws-ucws linux 1 297 18 3:26
🟨​ aws-ucws windows 1 299 16 3:50
🟨​ azure linux 1 216 98 3:19
🟨​ azure windows 1 218 96 2:44
🟨​ azure-ucws linux 1 299 15 3:57
🟨​ azure-ucws windows 1 301 13 3:56
🟨​ gcp linux 1 215 100 2:52
🟨​ gcp windows 1 217 98 2:31
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K

@andrewnester andrewnester marked this pull request as ready for review June 1, 2026 11:34
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Approval status: pending

/acceptance/bundle/ - needs approval

23 files changed
Suggested: @denik
Also eligible: @pietern, @janniklasrose, @shreyas-goenka, @lennartkats-db, @anton-107

/bundle/ - needs approval

26 files changed
Suggested: @denik
Also eligible: @pietern, @janniklasrose, @shreyas-goenka, @lennartkats-db, @anton-107

/cmd/bundle/ - needs approval

Files: cmd/bundle/utils/process.go
Suggested: @denik
Also eligible: @pietern, @janniklasrose, @shreyas-goenka, @lennartkats-db, @anton-107

/libs/sync/ - needs approval

Files: libs/sync/sync.go
Suggested: @simonfaltum
Also eligible: @tanmay-db, @Divyansh-db, @renaudhartert-db, @parthban-db, @hectorcast-db, @tejaskochar-db, @mihaimitrea-db, @chrisst, @rauchy

General files (require maintainer)

Files: acceptance/bin/print_requests.py, libs/testserver/handlers.go
Based on git history:

  • @denik -- recent work in bundle/phases/, bundle/config/mutator/, bundle/config/

Any maintainer (@anton-107, @denik, @pietern, @shreyas-goenka, @simonfaltum, @renaudhartert-db) can approve all areas.
See OWNERS for ownership rules.

@andrewnester andrewnester requested a review from denik June 2, 2026 10:01

@shreyas-goenka shreyas-goenka left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments - other than the bit where we use metadata.json

Comment thread bundle/phases/deploy.go Outdated
@@ -15,6 +15,9 @@ type Bundle struct {

type Workspace struct {
FilePath string `json:"file_path"`
// SnapshotPath is the workspace path of the immutable snapshot uploaded
// during deployment. Only populated for bundles with bundle.immutable = true.
SnapshotPath string `json:"snapshot_path,omitempty"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the UI use the immutable snapshot path? In that case we'll need to add it to DMS as well.

Comment thread bundle/deploy/snapshot/upload.go
Comment thread bundle/deploy/snapshot/client.go
Comment thread bundle/deploy/snapshot/path_test.go Outdated
Comment thread bundle/deploy/metadata/load.go Outdated
Comment thread bundle/config/mutator/translate_paths.go Outdated

// Perform resolution only if the path starts with one of the specified prefixes.
if slices.ContainsFunc(prefixes, path.HasPrefix) {
if slices.Contains(m.excludePaths, path.String()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nitpicky: Make this more robust? This matches substrings - so "abc" would match "abc" and "abcd". Normally in the codebase - paths refer to exact paths, not patterns or substrings.

Comment thread bundle/config/mutator/resourcemutator/process_static_resources.go Outdated
Comment thread acceptance/bundle/deploy/immutable/output.txt Outdated

@pietern pietern left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the existing tests pass against cloud, but recommend including a testserver implementation already. Makes it easier to iterate.

Comment thread bundle/phases/build.go
Comment thread bundle/deploy/snapshot/path.go Outdated
}
return 0
})
return files, nil

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic that collects the list of files looks very similar to what we do in libs/sync. Any chance we can let that pkg take care of building the list and we refer to it here? This set of files is also configured as b.Files for telemetry. If we chase the path that sets it we might be able to reuse it?

Comment thread bundle/deploy/snapshot/client.go
}

// The real API uses the workspace user UUID (not email) in the snapshot path,
// matching service-principal identities used in cloud acceptance tests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: Should we also use UUIDs here for better fidelity? The benefits are minor if we have cloud coverage. The difference being users have CAN_MANAGE always on their home directory but that's likely not true for /Users/userId?

Comment thread bundle/permissions/workspace_root.go
Comment thread bundle/internal/schema/annotations.yml Outdated
Whether to fail on active runs. If this is set to true a deployment that is running can be interrupted.
"immutable_folder":
"description": |-
Whether to upload bundle files and artifacts as a single immutable snapshot. When true, all files are packaged into a zip and uploaded via the snapshot API, and workspace.file_path and workspace.artifact_path are set to the returned content-addressed path. The validate and plan commands make no mutative API calls when this is enabled.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snapshots API is internal. We should not refer to this in the docs.

Comment thread bundle/deploy/snapshot/translate_paths.go
func (m *translateResourcePaths) Name() string { return "snapshot.TranslateResourcePaths" }

func (m *translateResourcePaths) Apply(_ context.Context, b *bundle.Bundle) diag.Diagnostics {
localPrefix := b.SyncRootPath + "/"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe strings.TrimSuffix("/") as well to account for trailing /? Or is that not possible?

Comment thread bundle/deploy/snapshot/state.go Outdated
func (s *loadState) Name() string { return "snapshot.LoadState" }

func (s *saveState) Apply(ctx context.Context, b *bundle.Bundle) diag.Diagnostics {
if b.Config.Workspace.SnapshotPath == "" {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not need to store this remotely? The code reads like it only does local storage but we need it in remote as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like something we can add to resources.json

Comment thread bundle/deploy/snapshot/state.go Outdated

// SaveState writes the snapshot path to the local deployment state directory
// so it can be recovered during destroy without reading metadata.json.
func SaveState() bundle.Mutator {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider integrating this with deployment WAL? If we can record the snapshot upload event we can avoid reuploading it on a subsequent deployment since it already exists.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't really, can we? The deployment WAL calculated before the execution of deployment as part of plan but we build and upload later in the phase

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume during deployments we write to WAL so surely we should be able to do that during / after file upload? Otherwise do how do we capture if the plan was partially applied.

@shreyas-goenka shreyas-goenka Jun 19, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be misunderstanding the WAL - I'm not familiar with it (will look into it) - I just assumed it records actions as we do them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misunderstood you and thought you're talking about execution graph and not WAL :) WAL indeed makes sense but it's currently limitted to only creation of resources. I think it might worth expanding it generally to more steps of deploy but this is outside of the scope

>>> [CLI] jobs get [NUMID]
"/Workspace/Users/[UUID]/.snapshots/test-bundle-immutable-no-artifacts-[UNIQUE_NAME]/[SNAPSHOT_HASH]/src/files/src/notebook"

>>> [CLI] bundle destroy --auto-approve

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we assert that destroy deletes the snapshot? Even when .databricks is removed?

// Updates (dynamic): resources.* (strings) (resolves variable references to their actual values)
// Resolves variable references in 'resources' using bundle, workspace, and variables prefixes
mutator.ResolveVariableReferencesOnlyResources(),
resourceResolver,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this diff?

@@ -0,0 +1,16 @@
Local = false

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can run locally as well? As long as we fix the snapshot API impl in test server?

}

func (m *overrideImmutableFolder) Apply(ctx context.Context, b *bundle.Bundle) diag.Diagnostics {
if env.Get(ctx, "DATABRICKS_IMMUTABLE_FOLDER") == "" {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For later: move to bundle/env to centralize all env vars we use, and infix with _BUNDLE_.

Comment thread bundle/config/mutator/override_immutable_folder_test.go Outdated
state.Files = fl

// Persist the snapshot path so destroy on a different machine can find it.
state.SnapshotPath = b.Config.Workspace.SnapshotPath

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This state file is not needed for destroy today because we remove the file path recursively.

Storing a different path means there is proper state to keep track of.

Can this be implemented as a (hidden?) direct engine resource instead? Then we keep track of it next to the other state, benefit from the WAL, can auto-resolve paths on deployment, etc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this might be useful. The only limitation I see it is that this will be only available for direct engine while now it's both.
Also I believe we don't strictly need this in the version revision of the feature and can follow up with this later

Comment thread libs/sync/sync.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants