refactor: remove verification manager and related components from wor…#821
refactor: remove verification manager and related components from wor…#821
Conversation
…kspace and job agents; update orchestration and job agent registry to streamline verification process
📝 WalkthroughWalkthroughThis PR removes the internal verification subsystem (manager, scheduler, hooks, measurement execution, and recording) and replaces it with a simpler database-backed abstraction. The verification.Manager is removed from the Argo job agent, release manager, and workspace, eliminating automatic verification lifecycle management and rollback-on-verification-failure behavior. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…roper file formatting
… job mapping; modify OpenAPI paths for deployment variable management and streamline deployment-related endpoints
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Free Tier Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Nil reconcileQueue causes panic in verification starter
- Added a nil check in DBVerificationStarter.StartVerification to return a clear error when the reconcile queue is not configured, preventing a panic.
Or push these changes by commenting:
@cursor push 3b27e89288
Preview (3b27e89288)
diff --git a/apps/workspace-engine/pkg/workspace/releasemanager/action/verification/starter.go b/apps/workspace-engine/pkg/workspace/releasemanager/action/verification/starter.go
--- a/apps/workspace-engine/pkg/workspace/releasemanager/action/verification/starter.go
+++ b/apps/workspace-engine/pkg/workspace/releasemanager/action/verification/starter.go
@@ -30,6 +30,10 @@
return nil
}
+ if s.Queue == nil {
+ return fmt.Errorf("reconcile queue is not configured; use workspace.WithReconcileQueue when creating the workspace")
+ }
+
queries := db.GetQueries(ctx)
jobUUID := uuid.MustParse(job.Id)| verificationStarter := &verificationaction.DBVerificationStarter{ | ||
| Queue: ws.reconcileQueue, | ||
| WorkspaceID: ws.ID, | ||
| } |
There was a problem hiding this comment.
Nil reconcileQueue causes panic in verification starter
High Severity
DBVerificationStarter is created with ws.reconcileQueue, which defaults to nil when WithReconcileQueue is not passed to workspace.New. Multiple existing callers (e.g., in resourceproviders_cache_test.go, resourceproviders_bench_test.go, dbtest.go) don't provide this option. When a policy-based verification triggers StartVerification with non-empty specs, calling s.Queue.Enqueue(...) on a nil Queue interface will panic. Unlike traceStore, which has a default in-memory implementation and an explicit nil-check panic in the release manager, reconcileQueue has neither a default nor a guard.
Additional Locations (1)
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
apps/workspace-engine/pkg/workspace/releasemanager/manager.go (1)
44-55: Add a constructor guard forjobAgentRegistry.
traceStoreis fail-fast validated, butjobAgentRegistryat Line 55 is not. A nil registry will fail later in execution paths instead of at construction.Suggested hardening
func New(store *store.Store, traceStore PersistenceStore, jobAgentRegistry *jobagents.Registry) *Manager { if traceStore == nil { panic("traceStore cannot be nil - deployment tracing is mandatory") } + if jobAgentRegistry == nil { + panic("jobAgentRegistry cannot be nil") + } policyManager := policy.New(store)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/workspace-engine/pkg/workspace/releasemanager/manager.go` around lines 44 - 55, The constructor New currently panics if traceStore is nil but does not validate jobAgentRegistry; add a fail-fast guard at the start of New to check if jobAgentRegistry == nil and panic with a clear message (e.g., "jobAgentRegistry cannot be nil - required for job execution/agent lookup"); ensure this check is placed alongside the traceStore validation so Manager, planner, eligibility, and executor are never built with a nil jobAgentRegistry.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@apps/workspace-engine/pkg/workspace/releasemanager/action/verification/starter.go`:
- Around line 42-63: The insert+enqueue sequence can leave orphaned metrics if
InsertJobVerificationMetric succeeds but s.Queue.Enqueue fails; fix by making
the operation atomic or compensating: either wrap the insert and enqueue in a
single transaction/outbox so the enqueue is recorded atomically with the metric,
or if you can't add an outbox, perform compensating cleanup by deleting the
inserted metric on enqueue failure (call the appropriate delete query with
metric.ID) and return the enqueue error; update the code around
InsertJobVerificationMetric and s.Queue.Enqueue
(reconcile.EnqueueParams/metric.ID) accordingly.
- Around line 33-35: Replace the panic-prone uuid.MustParse call with uuid.Parse
and propagate a proper error: call uuid.Parse(job.Id), check the returned error,
and if non-nil return an error (e.g., fmt.Errorf("invalid job id %q: %w",
job.Id, err)) from the current function so callers can handle it; update any
subsequent uses of jobUUID (the variable created after parsing) to rely on the
safely-parsed uuid.UUID value.
In `@apps/workspace-engine/pkg/workspace/workspace.go`:
- Around line 49-52: DBVerificationStarter.StartVerification currently calls
s.Queue.Enqueue(...) without guarding against a nil Queue
(DBVerificationStarter.Queue is an optional interface set by
WithReconcileQueue), which can panic; update StartVerification in
releasemanager/action/verification/starter.go to check if s.Queue != nil before
calling s.Queue.Enqueue (if nil, either no-op or log a debug/info message and
return), ensuring DBVerificationStarter can be constructed via workspace.New()
without WithReconcileQueue; alternatively, if you prefer enforcing the option,
update workspace.New and all callers to require WithReconcileQueue(), but the
safer immediate fix is to add the nil guard around s.Queue.Enqueue in
StartVerification.
---
Nitpick comments:
In `@apps/workspace-engine/pkg/workspace/releasemanager/manager.go`:
- Around line 44-55: The constructor New currently panics if traceStore is nil
but does not validate jobAgentRegistry; add a fail-fast guard at the start of
New to check if jobAgentRegistry == nil and panic with a clear message (e.g.,
"jobAgentRegistry cannot be nil - required for job execution/agent lookup");
ensure this check is placed alongside the traceStore validation so Manager,
planner, eligibility, and executor are never built with a nil jobAgentRegistry.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (29)
apps/workspace-engine/pkg/workspace/jobagents/argo/argoapp.goapps/workspace-engine/pkg/workspace/jobagents/argo/argoapp_verifications.goapps/workspace-engine/pkg/workspace/jobagents/registry.goapps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.goapps/workspace-engine/pkg/workspace/releasemanager/action/verification/starter.goapps/workspace-engine/pkg/workspace/releasemanager/action/verification/verification.goapps/workspace-engine/pkg/workspace/releasemanager/action/verification/verification_test.goapps/workspace-engine/pkg/workspace/releasemanager/deployment/executor_test.goapps/workspace-engine/pkg/workspace/releasemanager/manager.goapps/workspace-engine/pkg/workspace/releasemanager/manager_test.goapps/workspace-engine/pkg/workspace/releasemanager/verification/executor.goapps/workspace-engine/pkg/workspace/releasemanager/verification/executor_test.goapps/workspace-engine/pkg/workspace/releasemanager/verification/hooks.goapps/workspace-engine/pkg/workspace/releasemanager/verification/manager.goapps/workspace-engine/pkg/workspace/releasemanager/verification/manager_test.goapps/workspace-engine/pkg/workspace/releasemanager/verification/recorder.goapps/workspace-engine/pkg/workspace/releasemanager/verification/recorder_test.goapps/workspace-engine/pkg/workspace/releasemanager/verification/scheduler.goapps/workspace-engine/pkg/workspace/releasemanager/verification/scheduler_test.goapps/workspace-engine/pkg/workspace/releasemanager/verification_hooks.goapps/workspace-engine/pkg/workspace/workflowmanager/manager_test.goapps/workspace-engine/pkg/workspace/workspace.goapps/workspace-engine/svc/workspaceconsumer/consumer.goapps/workspace-engine/test/e2e/engine_job_verification_query_test.goapps/workspace-engine/test/e2e/engine_policy_rollback_evaluator_test.goapps/workspace-engine/test/e2e/engine_policy_rollback_test.goapps/workspace-engine/test/e2e/engine_verification_hooks_test.goapps/workspace-engine/test/e2e/engine_verification_lifecycle_test.goapps/workspace-engine/test/e2e/engine_workspace_accessors_test.go
💤 Files with no reviewable changes (18)
- apps/workspace-engine/test/e2e/engine_policy_rollback_evaluator_test.go
- apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
- apps/workspace-engine/pkg/workspace/releasemanager/verification/recorder_test.go
- apps/workspace-engine/pkg/workspace/releasemanager/verification/manager_test.go
- apps/workspace-engine/test/e2e/engine_job_verification_query_test.go
- apps/workspace-engine/test/e2e/engine_verification_hooks_test.go
- apps/workspace-engine/pkg/workspace/releasemanager/verification/hooks.go
- apps/workspace-engine/test/e2e/engine_workspace_accessors_test.go
- apps/workspace-engine/pkg/workspace/releasemanager/verification/scheduler_test.go
- apps/workspace-engine/pkg/workspace/releasemanager/verification_hooks.go
- apps/workspace-engine/pkg/workspace/jobagents/argo/argoapp_verifications.go
- apps/workspace-engine/pkg/workspace/releasemanager/verification/manager.go
- apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go
- apps/workspace-engine/pkg/workspace/releasemanager/verification/scheduler.go
- apps/workspace-engine/pkg/workspace/releasemanager/verification/executor_test.go
- apps/workspace-engine/pkg/workspace/releasemanager/verification/executor.go
- apps/workspace-engine/test/e2e/engine_verification_lifecycle_test.go
- apps/workspace-engine/pkg/workspace/releasemanager/verification/recorder.go
| queries := db.GetQueries(ctx) | ||
| jobUUID := uuid.MustParse(job.Id) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n apps/workspace-engine/pkg/workspace/releasemanager/action/verification/starter.go | sed -n '1,100p'Repository: ctrlplanedev/ctrlplane
Length of output: 2882
Avoid panic on invalid job IDs.
Line 34 uses uuid.MustParse, which panics on malformed IDs. Since the function returns error, use uuid.Parse and return an error instead.
Proposed fix
- jobUUID := uuid.MustParse(job.Id)
+ jobUUID, err := uuid.Parse(job.Id)
+ if err != nil {
+ return fmt.Errorf("invalid job id %q: %w", job.Id, err)
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| queries := db.GetQueries(ctx) | |
| jobUUID := uuid.MustParse(job.Id) | |
| queries := db.GetQueries(ctx) | |
| jobUUID, err := uuid.Parse(job.Id) | |
| if err != nil { | |
| return fmt.Errorf("invalid job id %q: %w", job.Id, err) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@apps/workspace-engine/pkg/workspace/releasemanager/action/verification/starter.go`
around lines 33 - 35, Replace the panic-prone uuid.MustParse call with
uuid.Parse and propagate a proper error: call uuid.Parse(job.Id), check the
returned error, and if non-nil return an error (e.g., fmt.Errorf("invalid job id
%q: %w", job.Id, err)) from the current function so callers can handle it;
update any subsequent uses of jobUUID (the variable created after parsing) to
rely on the safely-parsed uuid.UUID value.
| metric, err := queries.InsertJobVerificationMetric(ctx, db.InsertJobVerificationMetricParams{ | ||
| JobID: jobUUID, | ||
| Name: spec.Name, | ||
| Provider: providerJSON, | ||
| IntervalSeconds: spec.IntervalSeconds, | ||
| Count: int32(spec.Count), | ||
| SuccessCondition: spec.SuccessCondition, | ||
| SuccessThreshold: toPgInt4(spec.SuccessThreshold), | ||
| FailureCondition: toPgText(spec.FailureCondition), | ||
| FailureThreshold: toPgInt4(spec.FailureThreshold), | ||
| }) | ||
| if err != nil { | ||
| return fmt.Errorf("insert verification metric %q: %w", spec.Name, err) | ||
| } | ||
|
|
||
| if err := s.Queue.Enqueue(ctx, reconcile.EnqueueParams{ | ||
| WorkspaceID: s.WorkspaceID, | ||
| Kind: "job-verification-metric", | ||
| ScopeType: "job-verification-metric", | ||
| ScopeID: metric.ID.String(), | ||
| }); err != nil { | ||
| return fmt.Errorf("enqueue verification metric %q: %w", spec.Name, err) |
There was a problem hiding this comment.
Insert/enqueue sequence can leave orphaned verification metrics.
If metric insert succeeds but Queue.Enqueue fails, the metric row is persisted without reconcile work. This can drop verifications. Please make insert+enqueue atomic (single transaction/outbox flow) or add compensating cleanup.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@apps/workspace-engine/pkg/workspace/releasemanager/action/verification/starter.go`
around lines 42 - 63, The insert+enqueue sequence can leave orphaned metrics if
InsertJobVerificationMetric succeeds but s.Queue.Enqueue fails; fix by making
the operation atomic or compensating: either wrap the insert and enqueue in a
single transaction/outbox so the enqueue is recorded atomically with the metric,
or if you can't add an outbox, perform compensating cleanup by deleting the
inserted metric on enqueue failure (call the appropriate delete query with
metric.ID) and return the enqueue error; update the code around
InsertJobVerificationMetric and s.Queue.Enqueue
(reconcile.EnqueueParams/metric.ID) accordingly.
| verificationStarter := &verificationaction.DBVerificationStarter{ | ||
| Queue: ws.reconcileQueue, | ||
| WorkspaceID: ws.ID, | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Find and inspect DBVerificationStarter implementation for nil-queue handling
echo "=== DBVerificationStarter Implementation ==="
fd -e go -p "verification" apps/workspace-engine | head -20Repository: ctrlplanedev/ctrlplane
Length of output: 1917
🏁 Script executed:
#!/bin/bash
# 2) Search for DBVerificationStarter type definition and Start method
rg -n "type\s+DBVerificationStarter|func\s+\(.*DBVerificationStarter.*\)\s+Start" apps/workspace-engine --type go -A 5Repository: ctrlplanedev/ctrlplane
Length of output: 1283
🏁 Script executed:
#!/bin/bash
# 3) Search for Queue field usage in DBVerificationStarter methods
rg -n "Queue" apps/workspace-engine --type go | grep -i "verification\|verificationstarter" | head -20Repository: ctrlplanedev/ctrlplane
Length of output: 2180
🏁 Script executed:
#!/bin/bash
# 4) Inspect workspace.New() call sites and WithReconcileQueue usage
rg -n "workspace\.New\(" apps/workspace-engine --type go -B 2 -A 5Repository: ctrlplanedev/ctrlplane
Length of output: 9888
🏁 Script executed:
#!/bin/bash
# 5) Count WithReconcileQueue usage
rg -n "WithReconcileQueue" apps/workspace-engine --type goRepository: ctrlplanedev/ctrlplane
Length of output: 409
🏁 Script executed:
#!/bin/bash
# Read the full StartVerification method
cat -n apps/workspace-engine/pkg/workspace/releasemanager/action/verification/starter.go | head -80Repository: ctrlplanedev/ctrlplane
Length of output: 2821
🏁 Script executed:
#!/bin/bash
# Check reconcile.Queue type definition to understand if it can be nil safely
rg -n "type Queue interface|type Queue struct" apps/workspace-engine --type go -B 2 -A 8Repository: ctrlplanedev/ctrlplane
Length of output: 2880
🏁 Script executed:
#!/bin/bash
# Inspect workspace.go constructor and field initialization for reconcileQueue
cat -n apps/workspace-engine/pkg/workspace/workspace.go | head -100Repository: ctrlplanedev/ctrlplane
Length of output: 4123
Add nil check before Queue.Enqueue() call in DBVerificationStarter.StartVerification.
At line 57 of apps/workspace-engine/pkg/workspace/releasemanager/action/verification/starter.go, the code calls s.Queue.Enqueue() without checking if Queue is nil. Since Queue is an interface field on DBVerificationStarter and can remain uninitialized (nil) if WithReconcileQueue() is not provided during workspace.New(), this will panic at runtime with "value method called on nil interface."
The concern is validated: workspace initialization at multiple test call sites omits WithReconcileQueue(), and only one explicit production use of that option was found (consumer.go:89). Either:
- Add a nil guard in
StartVerificationbefore the Enqueue call, or - Ensure all
workspace.New()call sites consistently provideWithReconcileQueue()and document this requirement clearly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/workspace-engine/pkg/workspace/workspace.go` around lines 49 - 52,
DBVerificationStarter.StartVerification currently calls s.Queue.Enqueue(...)
without guarding against a nil Queue (DBVerificationStarter.Queue is an optional
interface set by WithReconcileQueue), which can panic; update StartVerification
in releasemanager/action/verification/starter.go to check if s.Queue != nil
before calling s.Queue.Enqueue (if nil, either no-op or log a debug/info message
and return), ensuring DBVerificationStarter can be constructed via
workspace.New() without WithReconcileQueue; alternatively, if you prefer
enforcing the option, update workspace.New and all callers to require
WithReconcileQueue(), but the safer immediate fix is to add the nil guard around
s.Queue.Enqueue in StartVerification.
… to clean up code and improve readability
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| return pgtype.Text{} | ||
| } | ||
| return pgtype.Text{String: *v, Valid: true} | ||
| } |
There was a problem hiding this comment.
Duplicated verification metric insert and helper functions
Low Severity
The insert-and-enqueue loop in DBVerificationStarter.StartVerification (marshal provider, InsertJobVerificationMetric, Queue.Enqueue) and the helper functions toPgInt4/toPgText are nearly identical to the existing logic in PostgresSetter.CreateJobWithVerification in setters_postgres.go. This duplication means any future bug fix or schema change needs to be applied in two places.
| rollbackHooks := rollback.NewRollbackHooks(store, jobAgentRegistry) | ||
| compositeHooks := verification.NewCompositeHooks(releaseManagerHooks, rollbackHooks) | ||
| verificationManager.SetHooks(compositeHooks) | ||
|
|
There was a problem hiding this comment.
In-memory verification store never populated after manager removal
Medium Severity
The removed verification.Manager was the sole writer to the in-memory store.JobVerifications. The new DBVerificationStarter writes only to the database via db.InsertJobVerificationMetric, not to the in-memory store. However, state_index.go, rollback/rollback.go, and versioncooldown/getters.go still read from store.JobVerifications.GetByJobId(), which will now always return empty. This means job verification data won't appear in API responses, and the rollback evaluator for verification failures will silently never trigger.



…kspace and job agents; update orchestration and job agent registry to streamline verification process
Note
High Risk
High risk because it removes core verification execution/restore paths and disables rollback-on-verification-failure behavior, changing release safety semantics and runtime flow. It also introduces new DB writes + queue enqueueing that must be correctly wired in production (e.g., reconcile queue availability).
Overview
Verification handling is refactored from an in-process scheduler to a DB/queue workflow.
VerificationActionnow depends on a newVerificationStarter(implemented byDBVerificationStarter) which persists verification metric specs to Postgres and enqueuesjob-verification-metricwork items.Verification manager wiring and verification-triggered rollback are removed. The Argo job agent no longer starts ArgoCD health verifications,
jobagents.Registryandreleasemanager.Managerdrop verification manager dependencies/restore logic, and rollback hooks/tests around rollback on verification failure are deleted; unit tests are updated to assert starter calls instead of in-memory verification records, and verification-related e2e coverage is removed.Written by Cursor Bugbot for commit 5031086. This will update automatically on new commits. Configure here.
Summary by CodeRabbit
Refactor
Removed Features