Immutable folder support in DABs#5254
Conversation
Integration test reportCommit: 3a0ec41
|
Approval status: pending
|
shreyas-goenka
left a comment
There was a problem hiding this comment.
Minor comments - other than the bit where we use metadata.json
| @@ -15,6 +15,9 @@ type Bundle struct { | |||
|
|
|||
| type Workspace struct { | |||
| FilePath string `json:"file_path"` | |||
| // SnapshotPath is the workspace path of the immutable snapshot uploaded | |||
| // during deployment. Only populated for bundles with bundle.immutable = true. | |||
| SnapshotPath string `json:"snapshot_path,omitempty"` | |||
There was a problem hiding this comment.
Will the UI use the immutable snapshot path? In that case we'll need to add it to DMS as well.
|
|
||
| // Perform resolution only if the path starts with one of the specified prefixes. | ||
| if slices.ContainsFunc(prefixes, path.HasPrefix) { | ||
| if slices.Contains(m.excludePaths, path.String()) { |
There was a problem hiding this comment.
very nitpicky: Make this more robust? This matches substrings - so "abc" would match "abc" and "abcd". Normally in the codebase - paths refer to exact paths, not patterns or substrings.
pietern
left a comment
There was a problem hiding this comment.
I see the existing tests pass against cloud, but recommend including a testserver implementation already. Makes it easier to iterate.
| } | ||
| return 0 | ||
| }) | ||
| return files, nil |
There was a problem hiding this comment.
The logic that collects the list of files looks very similar to what we do in libs/sync. Any chance we can let that pkg take care of building the list and we refer to it here? This set of files is also configured as b.Files for telemetry. If we chase the path that sets it we might be able to reuse it?
| } | ||
|
|
||
| // The real API uses the workspace user UUID (not email) in the snapshot path, | ||
| // matching service-principal identities used in cloud acceptance tests. |
There was a problem hiding this comment.
optional: Should we also use UUIDs here for better fidelity? The benefits are minor if we have cloud coverage. The difference being users have CAN_MANAGE always on their home directory but that's likely not true for /Users/userId?
| Whether to fail on active runs. If this is set to true a deployment that is running can be interrupted. | ||
| "immutable_folder": | ||
| "description": |- | ||
| Whether to upload bundle files and artifacts as a single immutable snapshot. When true, all files are packaged into a zip and uploaded via the snapshot API, and workspace.file_path and workspace.artifact_path are set to the returned content-addressed path. The validate and plan commands make no mutative API calls when this is enabled. |
There was a problem hiding this comment.
Snapshots API is internal. We should not refer to this in the docs.
| func (m *translateResourcePaths) Name() string { return "snapshot.TranslateResourcePaths" } | ||
|
|
||
| func (m *translateResourcePaths) Apply(_ context.Context, b *bundle.Bundle) diag.Diagnostics { | ||
| localPrefix := b.SyncRootPath + "/" |
There was a problem hiding this comment.
Maybe strings.TrimSuffix("/") as well to account for trailing /? Or is that not possible?
| func (s *loadState) Name() string { return "snapshot.LoadState" } | ||
|
|
||
| func (s *saveState) Apply(ctx context.Context, b *bundle.Bundle) diag.Diagnostics { | ||
| if b.Config.Workspace.SnapshotPath == "" { |
There was a problem hiding this comment.
Do we not need to store this remotely? The code reads like it only does local storage but we need it in remote as well.
There was a problem hiding this comment.
This sounds like something we can add to resources.json
|
|
||
| // SaveState writes the snapshot path to the local deployment state directory | ||
| // so it can be recovered during destroy without reading metadata.json. | ||
| func SaveState() bundle.Mutator { |
There was a problem hiding this comment.
Consider integrating this with deployment WAL? If we can record the snapshot upload event we can avoid reuploading it on a subsequent deployment since it already exists.
There was a problem hiding this comment.
We can't really, can we? The deployment WAL calculated before the execution of deployment as part of plan but we build and upload later in the phase
There was a problem hiding this comment.
I assume during deployments we write to WAL so surely we should be able to do that during / after file upload? Otherwise do how do we capture if the plan was partially applied.
There was a problem hiding this comment.
I could be misunderstanding the WAL - I'm not familiar with it (will look into it) - I just assumed it records actions as we do them.
There was a problem hiding this comment.
Sorry, I misunderstood you and thought you're talking about execution graph and not WAL :) WAL indeed makes sense but it's currently limitted to only creation of resources. I think it might worth expanding it generally to more steps of deploy but this is outside of the scope
| >>> [CLI] jobs get [NUMID] | ||
| "/Workspace/Users/[UUID]/.snapshots/test-bundle-immutable-no-artifacts-[UNIQUE_NAME]/[SNAPSHOT_HASH]/src/files/src/notebook" | ||
|
|
||
| >>> [CLI] bundle destroy --auto-approve |
There was a problem hiding this comment.
can we assert that destroy deletes the snapshot? Even when .databricks is removed?
| // Updates (dynamic): resources.* (strings) (resolves variable references to their actual values) | ||
| // Resolves variable references in 'resources' using bundle, workspace, and variables prefixes | ||
| mutator.ResolveVariableReferencesOnlyResources(), | ||
| resourceResolver, |
| @@ -0,0 +1,16 @@ | |||
| Local = false | |||
There was a problem hiding this comment.
we can run locally as well? As long as we fix the snapshot API impl in test server?
| } | ||
|
|
||
| func (m *overrideImmutableFolder) Apply(ctx context.Context, b *bundle.Bundle) diag.Diagnostics { | ||
| if env.Get(ctx, "DATABRICKS_IMMUTABLE_FOLDER") == "" { |
There was a problem hiding this comment.
For later: move to bundle/env to centralize all env vars we use, and infix with _BUNDLE_.
| state.Files = fl | ||
|
|
||
| // Persist the snapshot path so destroy on a different machine can find it. | ||
| state.SnapshotPath = b.Config.Workspace.SnapshotPath |
There was a problem hiding this comment.
This state file is not needed for destroy today because we remove the file path recursively.
Storing a different path means there is proper state to keep track of.
Can this be implemented as a (hidden?) direct engine resource instead? Then we keep track of it next to the other state, benefit from the WAL, can auto-resolve paths on deployment, etc.
There was a problem hiding this comment.
Agreed, this might be useful. The only limitation I see it is that this will be only available for direct engine while now it's both.
Also I believe we don't strictly need this in the version revision of the feature and can follow up with this later
Changes
Added support for deploying bundles to immutable folders in the workspace
Enabled by using
Why
Tests
Added an acceptance tests