bundle/fuzz: add create-payload parity fuzz test for terraform vs direct#5686
Draft
radakam wants to merge 3 commits into
Draft
bundle/fuzz: add create-payload parity fuzz test for terraform vs direct#5686radakam wants to merge 3 commits into
radakam wants to merge 3 commits into
Conversation
Implements the first technique from DECO-25361: generate random job
configs and check for differences in the create payload between the
terraform and direct deploy engines.
Both engines run the same `bundle deploy` pipeline in-process (via
testcli) against a testserver, differing only in DATABRICKS_BUNDLE_ENGINE,
and the POST /api/2.2/jobs/create body each sends is captured and diffed.
Because only the engine differs, shared mutators cancel out and any
remaining diff is a genuine engine divergence.
The fuzzer already surfaced two real (benign) divergences, documented in
DefaultIgnorePaths:
- num_workers: 0 is sent explicitly by terraform but dropped by direct
(omitempty).
- the terraform provider strips the deprecated spark conf
"spark.databricks.delta.preview.enabled"; direct forwards it.
Run with: go test ./bundle/fuzz -run TestJobCreateParity
(FUZZ_SEEDS overrides the seed count; auto-skips when terraform is not
provisioned via acceptance/install_terraform.py).
…ignore Address golangci-lint failures (intrange loops, strconv.FormatBool over fmt.Sprintf) and tighten the create-payload ignore list: drop the dead job_clusters num_workers entry (those are at parity) and document the task-level num_workers divergence as a real CLI gap to fix separately.
Collaborator
Integration test reportCommit: 2d8ea6d
|
- Add a `test-fuzz` task and a nightly CI job that provisions terraform and runs the create-payload parity tests. They previously always skipped because terraform was never provisioned in the test path. - Ignore repo-root build/ so the provisioned terraform binary and provider mirror are not accidentally committed. - Skip cleanly when build/ is only partially provisioned (missing provider mirror or .terraformrc) instead of failing mid-deploy. - Document that the harness covers jobs only for now (DECO-25361).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
Adds a fuzz-testing harness (
bundle/fuzz) that verifies theterraformanddirectdeploy engines produce equivalent job create payloads from the same bundle config — the first technique from DECO-25361.generate.go/rand.go— seeded, reproducible randomresources.Jobgenerator (math/rand/v2PCG). Produces deploy-safe configs across a wide field surface: schedules/triggers/continuous, email & webhook notifications, health rules, git sources, job clusters, and tasks (notebook, spark-python, python-wheel, condition) with retries, libraries, and richnew_clusterspecs.capture.go/capture_deploy.go— runs a realbundle deployfor each engine against an in-processtestserverand captures thePOST /api/2.2/jobs/createbody. Both engines go through the identical mutator pipeline, so shared transformations cancel out in the diff. Terraform is auto-skipped when not provisioned (RequireTerraform).compare.go— JSON payload diff (numeric-aware viaUseNumber, bracket-quoted paths for dotted map keys likespark_confentries) with a small, justifiedDefaultIgnorePaths.fuzz_test.go— seed-driven property test (TestJobCreateParity,FUZZ_SEEDSoverride) plus a nativeFuzzJobCreateParitytarget for deep ad-hoc runs.Two divergences surfaced by the fuzzer:
tasks[*].new_cluster.num_workers(single-node task cluster) — terraform sendsnum_workers: 0, direct omits it. This is a real, currently-unhandled gap:JobClustersFixups.initializeNumWorkersforce-sendsnum_workersforjob_clustersbut is not applied to task-levelnew_cluster. The knownjob_clusterscase is already at parity (verified — no ignore needed). Ignored here with a comment; the product fix tocluster_fixups.gowill follow in a separate PR so this stays purely about the fuzz harness.spark_conf["spark.databricks.delta.preview.enabled"]— terraform's provider strips this key; direct forwards it. Backend ignores it either way; benign provider-side filter, ignored.Why
We want confidence that the
directengine is behaviorally equivalent toterraformbefore it becomes the default. Hand-written acceptance tests only cover known shapes; randomized generation surfaces divergences in field combinations no one thought to test — as proven by the task-levelnum_workersgap above.Tests
go test ./bundle/fuzz/passes (terraform-backed; ~50s).task lint-q→ 0 issues.python3 acceptance/install_terraform.py --targetdir build.