Skip to content

bundle/fuzz: add create-payload parity fuzz test for terraform vs direct#5686

Draft
radakam wants to merge 3 commits into
mainfrom
deco-25361-fuzz-create-payload
Draft

bundle/fuzz: add create-payload parity fuzz test for terraform vs direct#5686
radakam wants to merge 3 commits into
mainfrom
deco-25361-fuzz-create-payload

Conversation

@radakam

@radakam radakam commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Changes

Adds a fuzz-testing harness (bundle/fuzz) that verifies the terraform and direct deploy engines produce equivalent job create payloads from the same bundle config — the first technique from DECO-25361.

  • generate.go / rand.go — seeded, reproducible random resources.Job generator (math/rand/v2 PCG). Produces deploy-safe configs across a wide field surface: schedules/triggers/continuous, email & webhook notifications, health rules, git sources, job clusters, and tasks (notebook, spark-python, python-wheel, condition) with retries, libraries, and rich new_cluster specs.
  • capture.go / capture_deploy.go — runs a real bundle deploy for each engine against an in-process testserver and captures the POST /api/2.2/jobs/create body. Both engines go through the identical mutator pipeline, so shared transformations cancel out in the diff. Terraform is auto-skipped when not provisioned (RequireTerraform).
  • compare.go — JSON payload diff (numeric-aware via UseNumber, bracket-quoted paths for dotted map keys like spark_conf entries) with a small, justified DefaultIgnorePaths.
  • fuzz_test.go — seed-driven property test (TestJobCreateParity, FUZZ_SEEDS override) plus a native FuzzJobCreateParity target for deep ad-hoc runs.

Two divergences surfaced by the fuzzer:

  1. tasks[*].new_cluster.num_workers (single-node task cluster) — terraform sends num_workers: 0, direct omits it. This is a real, currently-unhandled gap: JobClustersFixups.initializeNumWorkers force-sends num_workers for job_clusters but is not applied to task-level new_cluster. The known job_clusters case is already at parity (verified — no ignore needed). Ignored here with a comment; the product fix to cluster_fixups.go will follow in a separate PR so this stays purely about the fuzz harness.
  2. spark_conf["spark.databricks.delta.preview.enabled"] — terraform's provider strips this key; direct forwards it. Backend ignores it either way; benign provider-side filter, ignored.

Why

We want confidence that the direct engine is behaviorally equivalent to terraform before it becomes the default. Hand-written acceptance tests only cover known shapes; randomized generation surfaces divergences in field combinations no one thought to test — as proven by the task-level num_workers gap above.

Tests

  • go test ./bundle/fuzz/ passes (terraform-backed; ~50s).
  • task lint-q → 0 issues.
  • Terraform provisioned via python3 acceptance/install_terraform.py --targetdir build.

Implements the first technique from DECO-25361: generate random job
configs and check for differences in the create payload between the
terraform and direct deploy engines.

Both engines run the same `bundle deploy` pipeline in-process (via
testcli) against a testserver, differing only in DATABRICKS_BUNDLE_ENGINE,
and the POST /api/2.2/jobs/create body each sends is captured and diffed.
Because only the engine differs, shared mutators cancel out and any
remaining diff is a genuine engine divergence.

The fuzzer already surfaced two real (benign) divergences, documented in
DefaultIgnorePaths:
  - num_workers: 0 is sent explicitly by terraform but dropped by direct
    (omitempty).
  - the terraform provider strips the deprecated spark conf
    "spark.databricks.delta.preview.enabled"; direct forwards it.

Run with: go test ./bundle/fuzz -run TestJobCreateParity
(FUZZ_SEEDS overrides the seed count; auto-skips when terraform is not
provisioned via acceptance/install_terraform.py).
@radakam radakam temporarily deployed to test-trigger-is June 23, 2026 07:44 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 23, 2026 07:44 — with GitHub Actions Inactive
…ignore

Address golangci-lint failures (intrange loops, strconv.FormatBool over
fmt.Sprintf) and tighten the create-payload ignore list: drop the dead
job_clusters num_workers entry (those are at parity) and document the
task-level num_workers divergence as a real CLI gap to fix separately.
@radakam radakam temporarily deployed to test-trigger-is June 23, 2026 08:06 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 23, 2026 08:06 — with GitHub Actions Inactive
@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: 2d8ea6d

Run: 28017488291

Env 🟨​KNOWN 🔄​flaky ✅​pass 🙈​skip Time
🟨​ aws linux 1 216 99 3:09
🟨​ aws windows 1 218 97 2:46
🟨​ aws-ucws linux 1 2 295 18 3:36
🟨​ aws-ucws windows 1 299 16 3:39
🟨​ azure linux 1 216 98 3:04
🟨​ azure windows 1 218 96 2:38
🟨​ azure-ucws linux 1 299 15 3:42
🟨​ azure-ucws windows 1 301 13 3:24
🟨​ gcp linux 1 215 100 2:49
🟨​ gcp windows 1 217 98 2:39
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K
🔄​ TestFilerWorkspaceNotebook ✅​p ✅​p 🔄​f ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p
🔄​ TestFilerWorkspaceNotebook/scalaNb.scala ✅​p ✅​p 🔄​f ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p

- Add a `test-fuzz` task and a nightly CI job that provisions terraform
  and runs the create-payload parity tests. They previously always
  skipped because terraform was never provisioned in the test path.
- Ignore repo-root build/ so the provisioned terraform binary and
  provider mirror are not accidentally committed.
- Skip cleanly when build/ is only partially provisioned (missing
  provider mirror or .terraformrc) instead of failing mid-deploy.
- Document that the harness covers jobs only for now (DECO-25361).
@radakam radakam temporarily deployed to test-trigger-is June 23, 2026 09:47 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 23, 2026 09:47 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants