Skip to content

fix(supervisor): compat shim for COMPUTE checkpoint type#3703

Merged
nicktrn merged 1 commit into
mainfrom
fix/supervisor-checkpoint-type-compat
May 22, 2026
Merged

fix(supervisor): compat shim for COMPUTE checkpoint type#3703
nicktrn merged 1 commit into
mainfrom
fix/supervisor-checkpoint-type-compat

Conversation

@nicktrn
Copy link
Copy Markdown
Collaborator

@nicktrn nicktrn commented May 22, 2026

Workloads bundled with CLI versions before v4.4.4 use a strict zod enum for checkpoint.type that only allows DOCKER and KUBERNETES. When a customer's runs are routed via the compute path, those old runners receive type: "COMPUTE" on /snapshots/since/... and /dequeue responses and fail validation - blocking silent migration of existing deployments.

The workload never reads the field - only validates the shape. Rewriting COMPUTE -> KUBERNETES on the way out lets older runners keep parsing while the database and internal services keep the real value. Limited to the two workload-facing endpoints whose response includes a checkpoint; /continue, /attempts/start, /attempts/complete all return shapes without one.

Followup to #3114.

@nicktrn nicktrn self-assigned this May 22, 2026
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 22, 2026

⚠️ No Changeset found

Latest commit: 3aa3c31

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

Review Change Stack

Walkthrough

This PR adds backward compatibility for checkpoint type responses in the supervisor workload server. A helper function legacifyCheckpointType conditionally rewrites checkpoint type values from "COMPUTE" to "KUBERNETES" while preserving the full response structure. The helper is integrated into two endpoints that return checkpoint data: the snapshots retrieval endpoint and the dequeue endpoint. A changelog entry documents the fix for checkpoint compatibility when workloads produce compute-path checkpoints.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description addresses the context and motivation well, but the PR description template sections (Testing, Changelog, Screenshots, Checklist) are not filled out. Complete the required sections of the description template: mark checklist items, add a Testing section describing how the change was validated, provide a Changelog summary, and remove placeholder text.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding a compatibility shim to handle the COMPUTE checkpoint type in the supervisor.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/supervisor-checkpoint-type-compat

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
apps/supervisor/src/workloadServer/index.ts (1)

46-56: ⚡ Quick win

Add crumbs around the compat shim path for traceability.

Please add // @Crumbs (or a `// `#region` `@crumbs block) around this compatibility rewrite to match the repo’s tracing convention for new TS logic.

Proposed change
+// `@crumbs` checkpoint-type-compat: rewrite COMPUTE -> KUBERNETES for legacy runners
 function legacifyCheckpointType<T extends { checkpoint?: { type: string } | null }>(item: T): T {
   if (item.checkpoint?.type === "COMPUTE") {
     return { ...item, checkpoint: { ...item.checkpoint, type: "KUBERNETES" } } as T;
   }
   return item;
 }

As per coding guidelines, “Add crumbs as you write code using // @Crumbs comments or `// `#region` `@crumbs blocks for debug tracing.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/supervisor/src/workloadServer/index.ts` around lines 46 - 56, Wrap the
compatibility shim in a crumbs marker so it’s traceable: add a `// `@crumbs``
comment (or a `// `#region` `@crumbs`` / `// `#endregion`` pair) immediately
surrounding the legacifyCheckpointType function body/definition to mark the
compat rewrite for tracing; ensure the markers are adjacent to the function
declaration and cover the conditional that rewrites item.checkpoint.type (the
`item` parameter and its `checkpoint.type` handling) so the shim is easily
discoverable in traces.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@apps/supervisor/src/workloadServer/index.ts`:
- Around line 46-56: Wrap the compatibility shim in a crumbs marker so it’s
traceable: add a `// `@crumbs`` comment (or a `// `#region` `@crumbs`` / `//
`#endregion`` pair) immediately surrounding the legacifyCheckpointType function
body/definition to mark the compat rewrite for tracing; ensure the markers are
adjacent to the function declaration and cover the conditional that rewrites
item.checkpoint.type (the `item` parameter and its `checkpoint.type` handling)
so the shim is easily discoverable in traces.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2199935f-b8e6-4ace-a4da-d4b1f20fdbd2

📥 Commits

Reviewing files that changed from the base of the PR and between 832cf72 and 3aa3c31.

📒 Files selected for processing (2)
  • .server-changes/supervisor-checkpoint-type-compat.md
  • apps/supervisor/src/workloadServer/index.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: typecheck / typecheck
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Always import tasks from @trigger.dev/sdk. Never use @trigger.dev/sdk/v3 or deprecated client.defineJob.

Files:

  • apps/supervisor/src/workloadServer/index.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

**/*.{ts,tsx,js,jsx}: In packages/core (@trigger.dev/core), import subpaths only, never import from root.
Add crumbs as you write code using // @Crumbs comments or `// `#region` `@crumbs blocks for debug tracing. They should be stripped by agentcrumbs strip before merge.

Files:

  • apps/supervisor/src/workloadServer/index.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • apps/supervisor/src/workloadServer/index.ts
apps/supervisor/src/workloadServer/**/*.{js,ts}

📄 CodeRabbit inference engine (apps/supervisor/CLAUDE.md)

HTTP server for workload communication (heartbeats, snapshots) should be implemented in src/workloadServer/

Files:

  • apps/supervisor/src/workloadServer/index.ts
**/*.{js,jsx,ts,tsx,json,md,yml,yaml}

📄 CodeRabbit inference engine (AGENTS.md)

Code formatting must be enforced using Prettier before committing

Files:

  • apps/supervisor/src/workloadServer/index.ts
🧠 Learnings (5)
📚 Learning: 2026-05-14T14:54:39.095Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3545
File: .server-changes/agent-view-sessions.md:10-10
Timestamp: 2026-05-14T14:54:39.095Z
Learning: In the `trigger.dev` repository, do not flag inconsistent dot vs slash notation in route/path strings inside `.server-changes/*.md` files. These markdown files are consumed verbatim into the changelog, so the mixed notation (e.g., `resources.orgs.../runs.$runParam/...`) is intentional and should be preserved as-is.

Applied to files:

  • .server-changes/supervisor-checkpoint-type-compat.md
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

  • apps/supervisor/src/workloadServer/index.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

  • apps/supervisor/src/workloadServer/index.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma error P1001 ("Can't reach database server") in TypeScript, don’t assume a single error shape. Prisma can surface P1001 via two different error classes/fields: `PrismaClientKnownRequestError` exposes it as `err.code === "P1001"` (common during mid-query connection drops), while `PrismaClientInitializationError` exposes it as `err.errorCode === "P1001"` (common on client startup failure). Therefore, predicates should use `err.code === "P1001" || err.errorCode === "P1001"`. Do not flag `err.code === "P1001"` as “unreachable/never matches,” as it is expected in production.

Applied to files:

  • apps/supervisor/src/workloadServer/index.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma errors for P1001 ("Can't reach database server"), do not assume it only appears under a single property name. Prisma may surface P1001 via either `PrismaClientKnownRequestError` (`err.code === "P1001"`, e.g., mid-query connection drops) or `PrismaClientInitializationError` (`err.errorCode === "P1001"`, e.g., client startup connection failure). To reliably detect the condition, check `err.code === "P1001" || err.errorCode === "P1001"`, and avoid review rules that would incorrectly flag `err.code === "P1001"` as unreachable/never-matching.

Applied to files:

  • apps/supervisor/src/workloadServer/index.ts
🔇 Additional comments (2)
apps/supervisor/src/workloadServer/index.ts (1)

399-401: LGTM!

Also applies to: 426-428

.server-changes/supervisor-checkpoint-type-compat.md (1)

1-6: LGTM!

@nicktrn nicktrn merged commit ddad970 into main May 22, 2026
23 checks passed
@nicktrn nicktrn deleted the fix/supervisor-checkpoint-type-compat branch May 22, 2026 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants