Skip to content

[feature] add beforeDeleteHelm module hook binding#754

Open
Fkuloff wants to merge 2 commits intomainfrom
feature/before-delete-helm-binding
Open

[feature] add beforeDeleteHelm module hook binding#754
Fkuloff wants to merge 2 commits intomainfrom
feature/before-delete-helm-binding

Conversation

@Fkuloff
Copy link
Copy Markdown
Contributor

@Fkuloff Fkuloff commented Apr 27, 2026

Overview

Adds a new beforeDeleteHelm module hook binding that fires just before helm uninstall in DeleteModule, symmetric to beforeHelm / helm install. Hook failure aborts deletion: helm uninstall and afterDeleteHelm are not executed and converge retries with backoff. The hook is skipped when the module has no Helm release (call sits inside the if releaseExists branch in module_manager.go). Snapshots from kubernetes bindings are populated the same way as for the other Helm-lifecycle bindings.

Wired through every hook flavor:

  • shell hook — YAML config + JSON schema for v0/v1
  • batch hook — module-sdk binary (DTO field consumed via BatchHook.GetBeforeDeleteHelm)
  • in-binary Go hook — addon-operator/sdk registry path

What this PR does / why we need it

This binding is the native, addon-operator-side replacement for chart-level Helm pre-delete hooks. The hook runs in the operator process with full access to module values, snapshots, the patch collector, and metrics — instead of as a Job inside the cluster. Typical use cases that used to require chart-level pre-delete: removing finalizers from CRs before the chart drops their CRDs, revoking external integrations (Vault tokens, GitHub webhooks) before the pod with credentials is gone, taking a backup of state, or cleaning up resources the chart does not own.

it relates with:

End-to-end test in a Deckhouse cluster

Test module: external a-k-test v0.0.4 registered through a ModuleSource. Single batch hook (compiled via module-sdk) declares the new binding plus a kubernetes binding on a chart-rendered marker ConfigMap, and reads forceFail from config-values to flip into an error return:

var beforeDeleteHelmConfig = &pkg.HookConfig{
    OnBeforeDeleteHelm: &pkg.OrderedConfig{Order: 10},
    Kubernetes: []pkg.KubernetesConfig{{ /* marker CM in default ns */ }},
}

func handler(_ context.Context, input *pkg.HookInput) error {
    forceFail := input.ConfigValues.Get("aKTest.forceFail").Bool()
    input.Logger.Info("BDH_TEST: beforeDeleteHelm fired", "force_fail", forceFail)
    // ... log snapshot count + entries ...
    if forceFail {
        return fmt.Errorf("BDH_TEST: intentional failure to verify abort-on-error semantics")
    }
    return nil
}
Scenario 1 — hook fires before helm uninstall

After dcmd a-k-test (with forceFail: false), binding=beforeDeleteHelm runs in queue=main, then helm uninstall removes the chart resources. The structured binding field clearly distinguishes the firing from kubernetes/Synchronization that happens on module enable.

{"msg":"Module hook start","binding":"beforeDeleteHelm","hook":"hooks/batchhooks:batch/before_delete:0","module":"a-k-test","queue":"main","path":"/deckhouse/downloaded/a-k-test/v0.0.4/hooks/batchhooks","time":"2026-04-27T15:55:52Z"}
{"msg":"BDH_TEST: beforeDeleteHelm fired","binding":"beforeDeleteHelm","hook_force_fail":false,"hook_ts":"2026-04-27T15:55:52.604507587Z"}

Post-condition: marker ConfigMap, Deployment, Service, Ingress are all gone after the hook returned nil.

Scenario 2 — hook failure aborts deletion (abort-on-error)

With spec.settings.forceFail: true in ModuleConfig, dcmd a-k-test produces repeated module-delete failures and never calls helm uninstall:

{"msg":"BDH_TEST: returning intentional failure (forceFail=true)","binding":"beforeDeleteHelm","hook_force_fail":true}
{"level":"error","logger":"addon-operator.module-delete","msg":"Module delete failed, requeue task to retry after delay.","count":1,"error":"run beforeDeleteHelm hooks: module hook 'hooks/batchhooks:batch/before_delete:0' failed: ...BDH_TEST: intentional failure to verify abort-on-error semantics..."}
{"msg":"Module delete failed, requeue task to retry after delay.","count":2, ... }
{"msg":"Module delete failed, requeue task to retry after delay.","count":3, ... }
{"msg":"Module delete failed, requeue task to retry after delay.","count":4, ... }
{"msg":"Module delete failed, requeue task to retry after delay.","count":5, ... }
{"msg":"Module delete failed, requeue task to retry after delay.","count":6, ... }

Stack trace points at addon-operator/pkg/task/tasks/module-delete/task.go:103. Throughout the backoff loop (~12 min observed) the chart's resources stay alive:

$ k get cm,deploy,svc,ingress -n default -l module=a-k-test
NAME                       DATA   AGE
cm/a-k-test-marker         2      12m
deploy/a-k-test-server     1/1    12m
svc/a-k-test                      12m
ingress/a-k-test                  12m

No helm uninstall / DeleteRelease log line for a-k-test is ever emitted. Flipping forceFail back to false lets the next retry succeed and deletion proceeds normally.

Scenario 3 — kubernetes-binding snapshot is populated at hook time

The hook subscribes to a chart-rendered marker ConfigMap and logs snapshot contents on each fire. With v0.0.3 (no marker in templates) hook_count was 0; with v0.0.4 (marker present) it became 1 and exposed the actual ConfigMap name — proving snapshots are alive at beforeDeleteHelm time and the snapshot-injection condition added to RunHooksByBinding covers the new binding:

{"msg":"BDH_TEST: marker snapshot","binding":"beforeDeleteHelm","hook_count":1}
{"msg":"BDH_TEST: marker entry","binding":"beforeDeleteHelm","hook_idx":0,"hook_value":"\"a-k-test-marker\""}
Scenario 4 — hook is skipped when the module has no Helm release

Verified by code review: in module_manager.go DeleteModule the new RunHooksByBinding(ctx, BeforeDeleteHelm, ...) call is placed inside the if releaseExists branch, so a module with no chart (or whose release was already gone) skips the hook entirely — symmetric with skipping helm uninstall itself.

if releaseExists, err := ... .IsReleaseExists(ml.GetName()); !releaseExists {
    // log "Cannot find helm release for module" and skip uninstall
} else {
    // beforeDeleteHelm is fired only here
    if err := ml.RunHooksByBinding(ctx, BeforeDeleteHelm, deleteLogLabels); err != nil { ... }
    // helm uninstall
}

This was not exercised in the cluster on its own (would require a separate test module with empty templates/), since the surrounding logic for the no-release case was already covered by afterDeleteHelm and is unchanged.

Scenario 5 — works through module-sdk batch hook (not just shell)

The entire end-to-end test ran on the batch hook path: the test module ships a Go binary compiled with module-sdk, which dumps its config containing beforeDeleteHelm: 10, addon-operator parses that JSON via BatchHook.GetBeforeDeleteHelm, registers the binding, and fires it on delete. So scenarios 1–3 are all simultaneous proof that the SDK ↔ addon-operator type/DTO contract works end-to-end (OnBeforeDeleteHelm → DTO beforeDeleteHelm → typed accessor → registered binding → fired hook).

Scenarios summary

# Scenario Result
1 hook fires before helm uninstall ✅ verified in cluster (binding=beforeDeleteHelm, queue=main)
2 hook failure aborts uninstall, converge retries with backoff ✅ verified in cluster (6+ retries, release intact)
3 kubernetes-binding snapshot populated at hook time ✅ verified in cluster (hook_count=1)
4 hook skipped when no helm release ✅ by code review (call sits inside if releaseExists branch)
5 works through module-sdk batch hook ✅ — entire e2e ran on the batch path

Signed-off-by: Artem Kuleshov <artem.kuleshov@flant.com>
@Fkuloff Fkuloff requested review from ipaqsa and ldmonster April 27, 2026 10:44
@Fkuloff Fkuloff self-assigned this Apr 27, 2026
@Fkuloff Fkuloff added the enhancement New feature or request label Apr 27, 2026
Signed-off-by: Artem Kuleshov <artem.kuleshov@flant.com>
@Fkuloff Fkuloff marked this pull request as ready for review April 27, 2026 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant