Skip to content

perf(run-engine,webapp): look up PENDING_VERSION runs via ClickHouse#3707

Merged
ericallam merged 5 commits into
mainfrom
perf/pending-version-clickhouse-lookup
May 22, 2026
Merged

perf(run-engine,webapp): look up PENDING_VERSION runs via ClickHouse#3707
ericallam merged 5 commits into
mainfrom
perf/pending-version-clickhouse-lookup

Conversation

@ericallam
Copy link
Copy Markdown
Member

Summary

When a background worker registers, the engine resolves runs that were queued before the worker was ready (status PENDING_VERSION). That lookup used to scan a Postgres status index on TaskRun. Move it to ClickHouse: query candidate run ids from task_runs_v2, then refetch the actual rows from Postgres by primary key with a status = 'PENDING_VERSION' guard for idempotency.

Design

The lookup is a pluggable interface on the run engine (PendingVersionRunIdLookup). The webapp wires a ClickHouse-backed implementation through the org-scoped clickhouseFactory using a new "engine" client type, configured by RUN_ENGINE_CLICKHOUSE_* env vars. The URL falls back to CLICKHOUSE_URL when unset, so self-hosted deployments don't need new config to keep working.

When the lookup returns no candidates, one bounded retry is scheduled ~5s later to cover ClickHouse replication lag against task_runs_v2. The Postgres status guard on both the candidate refetch and the inner updateMany prevents double-promotion when a retry races with a concurrent deploy.

Tests cover three existing PENDING_VERSION cases via a small Postgres-backed test adapter; new ClickHouse-backed integration tests will follow.

When a background worker registers, the engine resolves runs that were
queued before the worker was ready. That lookup used to scan a Postgres
status index. Move it to ClickHouse: query candidate run ids from
`task_runs_v2`, then refetch the actual rows from Postgres by primary
key with a `status = 'PENDING_VERSION'` guard for idempotency.

The lookup is a pluggable interface on the run engine
(`PendingVersionRunIdLookup`); the webapp wires a ClickHouse-backed
implementation through the org-scoped `clickhouseFactory` using a new
"engine" client type, configured by `RUN_ENGINE_CLICKHOUSE_*` env vars.

When the lookup returns no candidates, one bounded retry is scheduled
~5s later to cover ClickHouse replication lag. The Postgres status
guard prevents double-promotion when retries race with concurrent
deploys.
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 22, 2026

⚠️ No Changeset found

Latest commit: 98e8272

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8253413a-cd36-4c69-befb-d8a3344a4111

📥 Commits

Reviewing files that changed from the base of the PR and between 48c2bc4 and 98e8272.

📒 Files selected for processing (2)
  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: e2e-webapp / 🧪 E2E Tests: Webapp
🧰 Additional context used
📓 Path-based instructions (8)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

**/*.{ts,tsx,js,jsx}: Prefer static imports over dynamic imports. Only use dynamic import() when circular dependencies cannot be resolved otherwise, code splitting is needed for performance, or the module must be loaded conditionally at runtime.
Import from @trigger.dev/core using subpaths only - never import from the root.
When writing Trigger.dev tasks, always import from @trigger.dev/sdk. Never use @trigger.dev/sdk/v3 or deprecated client.defineJob.
Add agentcrumbs markers (// @Crumbs or `#region `@crumbs) as you write code, not just when debugging. They stay on the branch throughout development and are stripped by agentcrumbs strip before merge.

Files:

  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: Access environment variables through the env export of env.server.ts instead of directly accessing process.env
Use subpath exports from @trigger.dev/core package instead of importing from the root @trigger.dev/core path

Use named constants for sentinel/placeholder values (e.g. const UNSET_VALUE = '__unset__') instead of raw string literals scattered across comparisons

Files:

  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
apps/webapp/**/*.server.ts

📄 CodeRabbit inference engine (apps/webapp/CLAUDE.md)

apps/webapp/**/*.server.ts: Never use request.signal for detecting client disconnects. Use getRequestAbortSignal() from app/services/httpAsyncStorage.server.ts instead, which is wired directly to Express res.on('close') and fires reliably
Access environment variables via env export from app/env.server.ts. Never use process.env directly
Always use findFirst instead of findUnique in Prisma queries. findUnique has an implicit DataLoader that batches concurrent calls and has active bugs even in Prisma 6.x (uppercase UUIDs returning null, composite key SQL correctness issues, 5-10x worse performance). findFirst is never batched and avoids this entire class of issues

Files:

  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
**/*.{js,jsx,ts,tsx,json,md,yml,yaml}

📄 CodeRabbit inference engine (AGENTS.md)

Code formatting must be enforced using Prettier before committing

Files:

  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
internal-packages/run-engine/src/engine/systems/**/*.ts

📄 CodeRabbit inference engine (internal-packages/run-engine/CLAUDE.md)

Integrate OpenTelemetry tracer and meter instrumentation in RunEngine systems for observability

Files:

  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
🧠 Learnings (9)
📚 Learning: 2026-03-10T17:56:20.938Z
Learnt from: samejr
Repo: triggerdotdev/trigger.dev PR: 3201
File: apps/webapp/app/v3/services/setSeatsAddOn.server.ts:25-29
Timestamp: 2026-03-10T17:56:20.938Z
Learning: Do not implement local userId-to-organizationId authorization checks inside org-scoped service classes (e.g., SetSeatsAddOnService, SetBranchesAddOnService) in the web app. Rely on route-layer authentication (requireUserId(request)) and org membership enforcement via the _app.orgs.$organizationSlug layout route. Any userId/organizationId that reaches these services from org-scoped routes has already been validated. Apply this pattern across all org-scoped services to avoid redundant auth checks and maintain consistency.

Applied to files:

  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma error P1001 ("Can't reach database server") in TypeScript, don’t assume a single error shape. Prisma can surface P1001 via two different error classes/fields: `PrismaClientKnownRequestError` exposes it as `err.code === "P1001"` (common during mid-query connection drops), while `PrismaClientInitializationError` exposes it as `err.errorCode === "P1001"` (common on client startup failure). Therefore, predicates should use `err.code === "P1001" || err.errorCode === "P1001"`. Do not flag `err.code === "P1001"` as “unreachable/never matches,” as it is expected in production.

Applied to files:

  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma errors for P1001 ("Can't reach database server"), do not assume it only appears under a single property name. Prisma may surface P1001 via either `PrismaClientKnownRequestError` (`err.code === "P1001"`, e.g., mid-query connection drops) or `PrismaClientInitializationError` (`err.errorCode === "P1001"`, e.g., client startup connection failure). To reliably detect the condition, check `err.code === "P1001" || err.errorCode === "P1001"`, and avoid review rules that would incorrectly flag `err.code === "P1001"` as unreachable/never-matching.

Applied to files:

  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
📚 Learning: 2026-03-29T19:16:28.864Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3291
File: apps/webapp/app/v3/featureFlags.ts:53-65
Timestamp: 2026-03-29T19:16:28.864Z
Learning: When reviewing TypeScript code that uses Zod v3, treat `z.coerce.*()` schemas as their direct Zod type (e.g., `z.coerce.boolean()` returns a `ZodBoolean` with `_def.typeName === "ZodBoolean"`) rather than a `ZodEffects`. Only `.preprocess()`, `.refine()`/`.superRefine()`, and `.transform()` are expected to wrap schemas in `ZodEffects`. Therefore, in reviewers’ logic like `getFlagControlType`, do not flag/unblock failures that require unwrapping `ZodEffects` when the input schema is a `z.coerce.*` schema.

Applied to files:

  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
📚 Learning: 2026-05-05T09:38:02.512Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3523
File: apps/webapp/app/routes/api.v3.batches.ts:178-181
Timestamp: 2026-05-05T09:38:02.512Z
Learning: When reviewing code that catches `ServiceValidationError` in `*.server.ts` files, do not blindly forward `error.status` to HTTP responses, because SVEs may be thrown with non-default statuses (e.g., 400/500) and forwarding them can cause client-visible behavioral regressions (e.g., surfacing 500s to clients). Prefer a safe default response status of `error.status ?? 422`, but only after confirming via the reachable call graph that the caught `ServiceValidationError` instances are expected to carry those non-default statuses; otherwise, normalize to `422` to avoid unexpected client-visible 5xx behavior.

Applied to files:

  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
📚 Learning: 2026-05-12T21:04:05.815Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3542
File: apps/webapp/app/components/sessions/v1/SessionStatus.tsx:1-3
Timestamp: 2026-05-12T21:04:05.815Z
Learning: In this Remix + TypeScript codebase, do not flag a server/client boundary violation when a file imports only types from a module matching `*.server`.

Specifically, it’s safe to import types using `import type { Foo } from "*.server"` or `import { type Foo } from "*.server"` because TypeScript erases type-only imports at compile time and they emit no JavaScript, so they won’t cross the Remix server/client bundle boundary.

Only raise the boundary concern for value imports (e.g., `import { Foo }` without `type`, or `import Foo`), since those produce JavaScript output.

Applied to files:

  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
📚 Learning: 2026-05-14T08:21:07.614Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3614
File: apps/webapp/app/v3/mollifier/mollifierGate.server.ts:48-52
Timestamp: 2026-05-14T08:21:07.614Z
Learning: When using Trigger.dev v3 feature flags in the webapp, prefer the existing per-org gating mechanism supported by `flag()` via the `overrides` argument. Pass `Organization.featureFlags` (from `environment.organization.featureFlags`) as the `overrides` value; overrides must take precedence over the global `featureFlag` row. Do not require schema changes or add an `orgId` field to `FlagsOptions` for per-org gating—use the overrides pattern consistently (e.g., in gate flows like `resolveOrgFlag` and any server code that threads `environment.organization.featureFlags` into the gate call).

Applied to files:

  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
🔇 Additional comments (5)
apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts (1)

1-97: LGTM!

internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts (4)

4-27: LGTM!


37-187: LGTM!


189-199: LGTM!


201-231: LGTM!


Walkthrough

This PR implements a pluggable ClickHouse-backed lookup for the Run Engine's pending-version discovery, reducing Postgres index reads by offloading candidate discovery to ClickHouse while re-validating by primary key in Postgres. It introduces a PendingVersionRunIdLookup contract (with noop and ClickHouse implementations), a dedicated run-engine ClickHouse client, a lightweight run_id query builder, and updates PendingVersionSystem and RunEngine wiring to perform two-step lookup, idempotent status promotion, and bounded lag-retries. Environment variables and test support are included.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description covers the core changes (ClickHouse lookup, pluggable interface, env var configuration, retry logic, testing approach) but does not follow the repository's required template structure with checklist, testing steps, and changelog sections. Restructure the description to follow the template: add checklist items, dedicated Testing section with test steps, Changelog section summarizing changes, and close with a reference to the issue number.
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main performance optimization: moving PENDING_VERSION run lookup from Postgres to ClickHouse, which is the primary change across the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/pending-version-clickhouse-lookup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts (1)

176-178: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Continuation check should use lookup page saturation, not filtered row count.

pendingRuns.length is after Postgres re-validation, so stale candidates can hide that the lookup already returned a full maxCount + 1 page. That can stop pagination early and leave remaining PENDING_VERSION runs for later.

🐛 Suggested fix
-    //enqueue more if needed
-    if (pendingRuns.length > maxCount) {
-      await this.scheduleResolvePendingVersionRuns(backgroundWorkerId);
+    // Enqueue more if lookup page was saturated (maxCount + 1 sentinel)
+    if (candidateIds.length > maxCount) {
+      await this.scheduleResolvePendingVersionRuns(backgroundWorkerId, { attempt });
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts`
around lines 176 - 178, The pagination early-stop check should use the original
lookup saturation count rather than the post-DB-revalidation length; change the
if condition in pendingVersionSystem.ts to trigger
scheduleResolvePendingVersionRuns(backgroundWorkerId) when the lookup returned
more than maxCount (e.g., use the variable that holds the raw lookup result
count or a boolean like lookupPageSaturated) instead of checking
pendingRuns.length, so that revalidated/stale filtering does not prevent
scheduling additional pages.
🧹 Nitpick comments (1)
internal-packages/run-engine/src/engine/services/pendingVersionLookup.ts (1)

28-35: ⚡ Quick win

Use a type alias instead of an interface for the lookup contract.

This contract should follow the repo rule to prefer type over interface in TypeScript.

♻️ Suggested change
-export interface PendingVersionRunIdLookup {
+export type PendingVersionRunIdLookup = {
   /** Stable identifier for logs and metrics, e.g. "clickhouse", "test-noop". */
   readonly name: string;
 
   lookupPendingVersionRunIds(
     options: PendingVersionRunIdLookupOptions
   ): Promise<PendingVersionRunIdLookupResult>;
-}
+};

As per coding guidelines **/*.{ts,tsx}: Use types over interfaces for TypeScript.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal-packages/run-engine/src/engine/services/pendingVersionLookup.ts`
around lines 28 - 35, Replace the PendingVersionRunIdLookup interface with a
type alias named PendingVersionRunIdLookup that models the same shape (readonly
name: string and the lookupPendingVersionRunIds method returning
Promise<PendingVersionRunIdLookupResult>) to follow the repo rule preferring
type over interface; keep the existing PendingVersionRunIdLookupOptions and
PendingVersionRunIdLookupResult references and the exact method signature so
callers remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts`:
- Around line 176-178: The pagination early-stop check should use the original
lookup saturation count rather than the post-DB-revalidation length; change the
if condition in pendingVersionSystem.ts to trigger
scheduleResolvePendingVersionRuns(backgroundWorkerId) when the lookup returned
more than maxCount (e.g., use the variable that holds the raw lookup result
count or a boolean like lookupPageSaturated) instead of checking
pendingRuns.length, so that revalidated/stale filtering does not prevent
scheduling additional pages.

---

Nitpick comments:
In `@internal-packages/run-engine/src/engine/services/pendingVersionLookup.ts`:
- Around line 28-35: Replace the PendingVersionRunIdLookup interface with a type
alias named PendingVersionRunIdLookup that models the same shape (readonly name:
string and the lookupPendingVersionRunIds method returning
Promise<PendingVersionRunIdLookupResult>) to follow the repo rule preferring
type over interface; keep the existing PendingVersionRunIdLookupOptions and
PendingVersionRunIdLookupResult references and the exact method signature so
callers remain unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 81b91b7c-1246-4f22-832b-90bad66f0a8d

📥 Commits

Reviewing files that changed from the base of the PR and between ddad970 and 1794758.

📒 Files selected for processing (17)
  • .server-changes/pending-version-clickhouse-lookup.md
  • apps/webapp/app/env.server.ts
  • apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts
  • apps/webapp/app/v3/runEngine.server.ts
  • apps/webapp/app/v3/runEnginePendingVersionLookup.server.ts
  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
  • internal-packages/clickhouse/src/index.ts
  • internal-packages/clickhouse/src/taskRuns.ts
  • internal-packages/run-engine/src/engine/index.ts
  • internal-packages/run-engine/src/engine/services/pendingVersionLookup.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
  • internal-packages/run-engine/src/engine/systems/systems.ts
  • internal-packages/run-engine/src/engine/tests/pendingVersion.test.ts
  • internal-packages/run-engine/src/engine/tests/postgresPendingVersionLookup.ts
  • internal-packages/run-engine/src/engine/types.ts
  • internal-packages/run-engine/src/engine/workerCatalog.ts
  • internal-packages/run-engine/src/index.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: e2e-webapp / 🧪 E2E Tests: Webapp
  • GitHub Check: internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (11)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

  • apps/webapp/app/v3/runEnginePendingVersionLookup.server.ts
  • apps/webapp/app/v3/runEngine.server.ts
  • internal-packages/run-engine/src/index.ts
  • internal-packages/run-engine/src/engine/systems/systems.ts
  • internal-packages/clickhouse/src/taskRuns.ts
  • internal-packages/run-engine/src/engine/types.ts
  • internal-packages/run-engine/src/engine/tests/postgresPendingVersionLookup.ts
  • internal-packages/run-engine/src/engine/tests/pendingVersion.test.ts
  • apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts
  • internal-packages/run-engine/src/engine/services/pendingVersionLookup.ts
  • apps/webapp/app/env.server.ts
  • internal-packages/run-engine/src/engine/index.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
  • internal-packages/run-engine/src/engine/workerCatalog.ts
  • internal-packages/clickhouse/src/index.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

  • apps/webapp/app/v3/runEnginePendingVersionLookup.server.ts
  • apps/webapp/app/v3/runEngine.server.ts
  • apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts
  • apps/webapp/app/env.server.ts
  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

**/*.{ts,tsx,js,jsx}: Prefer static imports over dynamic imports. Only use dynamic import() when circular dependencies cannot be resolved otherwise, code splitting is needed for performance, or the module must be loaded conditionally at runtime.
Import from @trigger.dev/core using subpaths only - never import from the root.
When writing Trigger.dev tasks, always import from @trigger.dev/sdk. Never use @trigger.dev/sdk/v3 or deprecated client.defineJob.
Add agentcrumbs markers (// @Crumbs or `#region `@crumbs) as you write code, not just when debugging. They stay on the branch throughout development and are stripped by agentcrumbs strip before merge.

Files:

  • apps/webapp/app/v3/runEnginePendingVersionLookup.server.ts
  • apps/webapp/app/v3/runEngine.server.ts
  • internal-packages/run-engine/src/index.ts
  • internal-packages/run-engine/src/engine/systems/systems.ts
  • internal-packages/clickhouse/src/taskRuns.ts
  • internal-packages/run-engine/src/engine/types.ts
  • internal-packages/run-engine/src/engine/tests/postgresPendingVersionLookup.ts
  • internal-packages/run-engine/src/engine/tests/pendingVersion.test.ts
  • apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts
  • internal-packages/run-engine/src/engine/services/pendingVersionLookup.ts
  • apps/webapp/app/env.server.ts
  • internal-packages/run-engine/src/engine/index.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
  • internal-packages/run-engine/src/engine/workerCatalog.ts
  • internal-packages/clickhouse/src/index.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • apps/webapp/app/v3/runEnginePendingVersionLookup.server.ts
  • apps/webapp/app/v3/runEngine.server.ts
  • internal-packages/run-engine/src/index.ts
  • internal-packages/run-engine/src/engine/systems/systems.ts
  • internal-packages/clickhouse/src/taskRuns.ts
  • internal-packages/run-engine/src/engine/types.ts
  • internal-packages/run-engine/src/engine/tests/postgresPendingVersionLookup.ts
  • internal-packages/run-engine/src/engine/tests/pendingVersion.test.ts
  • apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts
  • internal-packages/run-engine/src/engine/services/pendingVersionLookup.ts
  • apps/webapp/app/env.server.ts
  • internal-packages/run-engine/src/engine/index.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
  • internal-packages/run-engine/src/engine/workerCatalog.ts
  • internal-packages/clickhouse/src/index.ts
apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: Access environment variables through the env export of env.server.ts instead of directly accessing process.env
Use subpath exports from @trigger.dev/core package instead of importing from the root @trigger.dev/core path

Use named constants for sentinel/placeholder values (e.g. const UNSET_VALUE = '__unset__') instead of raw string literals scattered across comparisons

Files:

  • apps/webapp/app/v3/runEnginePendingVersionLookup.server.ts
  • apps/webapp/app/v3/runEngine.server.ts
  • apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts
  • apps/webapp/app/env.server.ts
  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
apps/webapp/**/*.server.ts

📄 CodeRabbit inference engine (apps/webapp/CLAUDE.md)

apps/webapp/**/*.server.ts: Never use request.signal for detecting client disconnects. Use getRequestAbortSignal() from app/services/httpAsyncStorage.server.ts instead, which is wired directly to Express res.on('close') and fires reliably
Access environment variables via env export from app/env.server.ts. Never use process.env directly
Always use findFirst instead of findUnique in Prisma queries. findUnique has an implicit DataLoader that batches concurrent calls and has active bugs even in Prisma 6.x (uppercase UUIDs returning null, composite key SQL correctness issues, 5-10x worse performance). findFirst is never batched and avoids this entire class of issues

Files:

  • apps/webapp/app/v3/runEnginePendingVersionLookup.server.ts
  • apps/webapp/app/v3/runEngine.server.ts
  • apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts
  • apps/webapp/app/env.server.ts
  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
**/*.{js,jsx,ts,tsx,json,md,yml,yaml}

📄 CodeRabbit inference engine (AGENTS.md)

Code formatting must be enforced using Prettier before committing

Files:

  • apps/webapp/app/v3/runEnginePendingVersionLookup.server.ts
  • apps/webapp/app/v3/runEngine.server.ts
  • internal-packages/run-engine/src/index.ts
  • internal-packages/run-engine/src/engine/systems/systems.ts
  • internal-packages/clickhouse/src/taskRuns.ts
  • internal-packages/run-engine/src/engine/types.ts
  • internal-packages/run-engine/src/engine/tests/postgresPendingVersionLookup.ts
  • internal-packages/run-engine/src/engine/tests/pendingVersion.test.ts
  • apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts
  • internal-packages/run-engine/src/engine/services/pendingVersionLookup.ts
  • apps/webapp/app/env.server.ts
  • internal-packages/run-engine/src/engine/index.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
  • internal-packages/run-engine/src/engine/workerCatalog.ts
  • internal-packages/clickhouse/src/index.ts
internal-packages/run-engine/src/engine/systems/**/*.ts

📄 CodeRabbit inference engine (internal-packages/run-engine/CLAUDE.md)

Integrate OpenTelemetry tracer and meter instrumentation in RunEngine systems for observability

Files:

  • internal-packages/run-engine/src/engine/systems/systems.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
**/*.{test,spec}.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use vitest for all tests in the Trigger.dev repository

Files:

  • internal-packages/run-engine/src/engine/tests/pendingVersion.test.ts
internal-packages/run-engine/src/engine/tests/**/*.test.ts

📄 CodeRabbit inference engine (internal-packages/run-engine/CLAUDE.md)

Implement tests for RunEngine in src/engine/tests/ using testcontainers for Redis and PostgreSQL containerization

Files:

  • internal-packages/run-engine/src/engine/tests/pendingVersion.test.ts
**/*.test.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.test.{ts,tsx,js,jsx}: Test files should live beside the files under test and use descriptive describe and it blocks
Unit tests should use vitest framework
Tests should avoid mocks or stubs and use helpers from @internal/testcontainers when Redis or Postgres are needed

**/*.test.{ts,tsx,js,jsx}: Never mock anything in tests - use testcontainers instead.
Test files should be placed next to source files (e.g., MyService.ts -> MyService.test.ts).

Files:

  • internal-packages/run-engine/src/engine/tests/pendingVersion.test.ts
🧠 Learnings (13)
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

  • apps/webapp/app/v3/runEnginePendingVersionLookup.server.ts
  • apps/webapp/app/v3/runEngine.server.ts
  • internal-packages/run-engine/src/index.ts
  • internal-packages/run-engine/src/engine/systems/systems.ts
  • internal-packages/clickhouse/src/taskRuns.ts
  • internal-packages/run-engine/src/engine/types.ts
  • internal-packages/run-engine/src/engine/tests/postgresPendingVersionLookup.ts
  • internal-packages/run-engine/src/engine/tests/pendingVersion.test.ts
  • apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts
  • internal-packages/run-engine/src/engine/services/pendingVersionLookup.ts
  • apps/webapp/app/env.server.ts
  • internal-packages/run-engine/src/engine/index.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
  • internal-packages/run-engine/src/engine/workerCatalog.ts
  • internal-packages/clickhouse/src/index.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

  • apps/webapp/app/v3/runEnginePendingVersionLookup.server.ts
  • apps/webapp/app/v3/runEngine.server.ts
  • internal-packages/run-engine/src/index.ts
  • internal-packages/run-engine/src/engine/systems/systems.ts
  • internal-packages/clickhouse/src/taskRuns.ts
  • internal-packages/run-engine/src/engine/types.ts
  • internal-packages/run-engine/src/engine/tests/postgresPendingVersionLookup.ts
  • internal-packages/run-engine/src/engine/tests/pendingVersion.test.ts
  • apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts
  • internal-packages/run-engine/src/engine/services/pendingVersionLookup.ts
  • apps/webapp/app/env.server.ts
  • internal-packages/run-engine/src/engine/index.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
  • internal-packages/run-engine/src/engine/workerCatalog.ts
  • internal-packages/clickhouse/src/index.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma error P1001 ("Can't reach database server") in TypeScript, don’t assume a single error shape. Prisma can surface P1001 via two different error classes/fields: `PrismaClientKnownRequestError` exposes it as `err.code === "P1001"` (common during mid-query connection drops), while `PrismaClientInitializationError` exposes it as `err.errorCode === "P1001"` (common on client startup failure). Therefore, predicates should use `err.code === "P1001" || err.errorCode === "P1001"`. Do not flag `err.code === "P1001"` as “unreachable/never matches,” as it is expected in production.

Applied to files:

  • apps/webapp/app/v3/runEnginePendingVersionLookup.server.ts
  • apps/webapp/app/v3/runEngine.server.ts
  • internal-packages/run-engine/src/index.ts
  • internal-packages/run-engine/src/engine/systems/systems.ts
  • internal-packages/clickhouse/src/taskRuns.ts
  • internal-packages/run-engine/src/engine/types.ts
  • internal-packages/run-engine/src/engine/tests/postgresPendingVersionLookup.ts
  • internal-packages/run-engine/src/engine/tests/pendingVersion.test.ts
  • apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts
  • internal-packages/run-engine/src/engine/services/pendingVersionLookup.ts
  • apps/webapp/app/env.server.ts
  • internal-packages/run-engine/src/engine/index.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
  • internal-packages/run-engine/src/engine/workerCatalog.ts
  • internal-packages/clickhouse/src/index.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma errors for P1001 ("Can't reach database server"), do not assume it only appears under a single property name. Prisma may surface P1001 via either `PrismaClientKnownRequestError` (`err.code === "P1001"`, e.g., mid-query connection drops) or `PrismaClientInitializationError` (`err.errorCode === "P1001"`, e.g., client startup connection failure). To reliably detect the condition, check `err.code === "P1001" || err.errorCode === "P1001"`, and avoid review rules that would incorrectly flag `err.code === "P1001"` as unreachable/never-matching.

Applied to files:

  • apps/webapp/app/v3/runEnginePendingVersionLookup.server.ts
  • apps/webapp/app/v3/runEngine.server.ts
  • internal-packages/run-engine/src/index.ts
  • internal-packages/run-engine/src/engine/systems/systems.ts
  • internal-packages/clickhouse/src/taskRuns.ts
  • internal-packages/run-engine/src/engine/types.ts
  • internal-packages/run-engine/src/engine/tests/postgresPendingVersionLookup.ts
  • internal-packages/run-engine/src/engine/tests/pendingVersion.test.ts
  • apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts
  • internal-packages/run-engine/src/engine/services/pendingVersionLookup.ts
  • apps/webapp/app/env.server.ts
  • internal-packages/run-engine/src/engine/index.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
  • internal-packages/run-engine/src/engine/workerCatalog.ts
  • internal-packages/clickhouse/src/index.ts
📚 Learning: 2026-03-29T19:16:28.864Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3291
File: apps/webapp/app/v3/featureFlags.ts:53-65
Timestamp: 2026-03-29T19:16:28.864Z
Learning: When reviewing TypeScript code that uses Zod v3, treat `z.coerce.*()` schemas as their direct Zod type (e.g., `z.coerce.boolean()` returns a `ZodBoolean` with `_def.typeName === "ZodBoolean"`) rather than a `ZodEffects`. Only `.preprocess()`, `.refine()`/`.superRefine()`, and `.transform()` are expected to wrap schemas in `ZodEffects`. Therefore, in reviewers’ logic like `getFlagControlType`, do not flag/unblock failures that require unwrapping `ZodEffects` when the input schema is a `z.coerce.*` schema.

Applied to files:

  • apps/webapp/app/v3/runEnginePendingVersionLookup.server.ts
  • apps/webapp/app/v3/runEngine.server.ts
  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
📚 Learning: 2026-05-05T09:38:02.512Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3523
File: apps/webapp/app/routes/api.v3.batches.ts:178-181
Timestamp: 2026-05-05T09:38:02.512Z
Learning: When reviewing code that catches `ServiceValidationError` in `*.server.ts` files, do not blindly forward `error.status` to HTTP responses, because SVEs may be thrown with non-default statuses (e.g., 400/500) and forwarding them can cause client-visible behavioral regressions (e.g., surfacing 500s to clients). Prefer a safe default response status of `error.status ?? 422`, but only after confirming via the reachable call graph that the caught `ServiceValidationError` instances are expected to carry those non-default statuses; otherwise, normalize to `422` to avoid unexpected client-visible 5xx behavior.

Applied to files:

  • apps/webapp/app/v3/runEnginePendingVersionLookup.server.ts
  • apps/webapp/app/v3/runEngine.server.ts
  • apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts
  • apps/webapp/app/env.server.ts
  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
📚 Learning: 2026-05-12T21:04:05.815Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3542
File: apps/webapp/app/components/sessions/v1/SessionStatus.tsx:1-3
Timestamp: 2026-05-12T21:04:05.815Z
Learning: In this Remix + TypeScript codebase, do not flag a server/client boundary violation when a file imports only types from a module matching `*.server`.

Specifically, it’s safe to import types using `import type { Foo } from "*.server"` or `import { type Foo } from "*.server"` because TypeScript erases type-only imports at compile time and they emit no JavaScript, so they won’t cross the Remix server/client bundle boundary.

Only raise the boundary concern for value imports (e.g., `import { Foo }` without `type`, or `import Foo`), since those produce JavaScript output.

Applied to files:

  • apps/webapp/app/v3/runEnginePendingVersionLookup.server.ts
  • apps/webapp/app/v3/runEngine.server.ts
  • apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts
  • apps/webapp/app/env.server.ts
  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
📚 Learning: 2026-05-14T08:21:07.614Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3614
File: apps/webapp/app/v3/mollifier/mollifierGate.server.ts:48-52
Timestamp: 2026-05-14T08:21:07.614Z
Learning: When using Trigger.dev v3 feature flags in the webapp, prefer the existing per-org gating mechanism supported by `flag()` via the `overrides` argument. Pass `Organization.featureFlags` (from `environment.organization.featureFlags`) as the `overrides` value; overrides must take precedence over the global `featureFlag` row. Do not require schema changes or add an `orgId` field to `FlagsOptions` for per-org gating—use the overrides pattern consistently (e.g., in gate flows like `resolveOrgFlag` and any server code that threads `environment.organization.featureFlags` into the gate call).

Applied to files:

  • apps/webapp/app/v3/runEnginePendingVersionLookup.server.ts
  • apps/webapp/app/v3/runEngine.server.ts
  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
📚 Learning: 2026-05-18T14:40:02.173Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3658
File: packages/core/src/v3/realtimeStreams/manager.test.ts:1-147
Timestamp: 2026-05-18T14:40:02.173Z
Learning: In the triggerdotdev/trigger.dev repo, the policy “Never mock anything — use testcontainers instead” should only be enforced for integration tests that interact with real external services (e.g., Redis, Postgres) via actual infrastructure. For unit tests that exercise pure in-memory logic (e.g., cache semantics) it is OK to stub collaborators such as `ApiClient` using Vitest (`vi.fn()`) to assert call counts or control behavior. Do not flag `vi.fn()`-based `ApiClient` stubs in unit tests as violations of the testcontainers policy.

Applied to files:

  • internal-packages/run-engine/src/engine/tests/pendingVersion.test.ts
📚 Learning: 2026-05-14T14:54:39.095Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3545
File: .server-changes/agent-view-sessions.md:10-10
Timestamp: 2026-05-14T14:54:39.095Z
Learning: In the `trigger.dev` repository, do not flag inconsistent dot vs slash notation in route/path strings inside `.server-changes/*.md` files. These markdown files are consumed verbatim into the changelog, so the mixed notation (e.g., `resources.orgs.../runs.$runParam/...`) is intentional and should be preserved as-is.

Applied to files:

  • .server-changes/pending-version-clickhouse-lookup.md
📚 Learning: 2026-03-26T09:02:07.973Z
Learnt from: myftija
Repo: triggerdotdev/trigger.dev PR: 3274
File: apps/webapp/app/services/runsReplicationService.server.ts:922-924
Timestamp: 2026-03-26T09:02:07.973Z
Learning: When parsing Trigger.dev task run annotations in server-side services, keep `TaskRun.annotations` strictly conforming to the `RunAnnotations` schema from `trigger.dev/core/v3`. If the code already uses `RunAnnotations.safeParse` (e.g., in a `#parseAnnotations` helper), treat that as intentional/necessary for atomic, schema-accurate annotation handling. Do not recommend relaxing the annotation payload schema or using a permissive “passthrough” parse path, since the annotations are expected to be written atomically in one operation and should not contain partial/legacy payloads that would require a looser parser.

Applied to files:

  • apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts
📚 Learning: 2026-05-20T17:21:18.543Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3678
File: apps/webapp/app/entry.server.tsx:0-0
Timestamp: 2026-05-20T17:21:18.543Z
Learning: In env.server.ts (Zod env schema), any environment variable you plan to access via the typed `env` export (e.g., `env.SENTRY_DSN`) must be explicitly declared in the schema. For `SENTRY_DSN`, include `SENTRY_DSN: z.string().optional()`; otherwise switching from `process.env.SENTRY_DSN` to `env.SENTRY_DSN` will fail TypeScript typechecking.

Applied to files:

  • apps/webapp/app/env.server.ts
📚 Learning: 2026-03-10T17:56:20.938Z
Learnt from: samejr
Repo: triggerdotdev/trigger.dev PR: 3201
File: apps/webapp/app/v3/services/setSeatsAddOn.server.ts:25-29
Timestamp: 2026-03-10T17:56:20.938Z
Learning: Do not implement local userId-to-organizationId authorization checks inside org-scoped service classes (e.g., SetSeatsAddOnService, SetBranchesAddOnService) in the web app. Rely on route-layer authentication (requireUserId(request)) and org membership enforcement via the _app.orgs.$organizationSlug layout route. Any userId/organizationId that reaches these services from org-scoped routes has already been validated. Apply this pattern across all org-scoped services to avoid redundant auth checks and maintain consistency.

Applied to files:

  • apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
🔇 Additional comments (17)
apps/webapp/app/env.server.ts (1)

1474-1488: LGTM!

.server-changes/pending-version-clickhouse-lookup.md (1)

1-6: LGTM!

apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts (1)

184-213: LGTM!

Also applies to: 259-260, 319-332, 399-400

internal-packages/clickhouse/src/taskRuns.ts (1)

382-399: LGTM!

internal-packages/clickhouse/src/index.ts (1)

16-17: LGTM!

Also applies to: 228-229

apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts (1)

1-92: LGTM!

internal-packages/run-engine/src/engine/index.ts (1)

68-68: LGTM!

Also applies to: 248-250, 300-302, 339-340

apps/webapp/app/v3/runEngine.server.ts (1)

9-9: LGTM!

Also applies to: 135-135

apps/webapp/app/v3/runEnginePendingVersionLookup.server.ts (1)

1-24: LGTM!

internal-packages/run-engine/src/engine/tests/postgresPendingVersionLookup.ts (1)

1-43: LGTM!

internal-packages/run-engine/src/engine/tests/pendingVersion.test.ts (1)

7-7: LGTM!

Also applies to: 48-49, 196-197, 361-362

internal-packages/run-engine/src/engine/workerCatalog.ts (1)

44-51: LGTM!

internal-packages/run-engine/src/engine/services/pendingVersionLookup.ts (1)

14-27: LGTM!

Also applies to: 37-48

internal-packages/run-engine/src/engine/types.ts (1)

22-23: LGTM!

Also applies to: 182-202

internal-packages/run-engine/src/engine/systems/systems.ts (1)

7-7: LGTM!

Also applies to: 22-22

internal-packages/run-engine/src/index.ts (1)

9-14: LGTM!

internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts (1)

8-175: LGTM!

Also applies to: 181-223

@ericallam ericallam marked this pull request as ready for review May 22, 2026 15:21
devin-ai-integration[bot]

This comment was marked as resolved.

ericallam added 2 commits May 22, 2026 16:38
When the idempotency guard fires (concurrent worker already promoted the
run, updateMany returns count=0), the transaction returned early — but
the eventBus.emit('runStatusChanged') call was outside the transaction
and fired unconditionally. Have the transaction return a boolean and
guard the emit on it.
…rsionSystem

The RunEngineOptions has carried queueRunsWaitingForWorkerBatchSize for
a while but never threaded it through to PendingVersionSystem's
queueRunsPendingVersionBatchSize, so the option silently no-op'd and
the system always used the default of 200.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts (1)

76-84: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep the continuation sentinel from the lookup page, not the filtered Postgres rows.

maxCount + 1 is acting as the "has more" probe, but Line 180 derives that from pendingRuns.length. If ClickHouse returns maxCount + 1 IDs and one of them is already past PENDING_VERSION, the Postgres guard drops it, pendingRuns.length falls back to maxCount, and the follow-up job is never scheduled. That can leave later PENDING_VERSION runs stranded until the next worker registration.

💡 Suggested fix
     const { runIds: candidateIds } = await this.$.pendingVersionRunIdLookup
       .lookupPendingVersionRunIds({
         organizationId: backgroundWorker.runtimeEnvironment.organizationId,
         projectId: backgroundWorker.projectId,
         environmentId: backgroundWorker.runtimeEnvironmentId,
         taskIdentifiers,
         queues,
         limit: maxCount + 1,
       });
+    const hasMoreCandidates = candidateIds.length > maxCount;

     if (!candidateIds.length) {
       await this.#maybeScheduleLagRetry(backgroundWorkerId, attempt, "lookup_empty");
       return;
     }

     const pendingRuns = await this.$.prisma.taskRun.findMany({
       where: {
         id: { in: candidateIds },
         status: "PENDING_VERSION",
       },
       orderBy: {
         createdAt: "asc",
       },
     });
+    const runsToProcess = pendingRuns.slice(0, maxCount);

-    for (const run of pendingRuns) {
+    for (const run of runsToProcess) {
       // ...
     }

-    if (pendingRuns.length > maxCount) {
+    if (hasMoreCandidates) {
       await this.scheduleResolvePendingVersionRuns(backgroundWorkerId);
     }

Also applies to: 96-104, 127-127, 179-181

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts`
around lines 76 - 84, The lookup uses maxCount + 1 as a continuation probe but
the code later computes the "has more" sentinel from the filtered Postgres rows
(pendingRuns), which can drop IDs and mistakenly clear continuation; update the
logic in the pendingVersionSystem flow around
pendingVersionRunIdLookup.lookupPendingVersionRunIds so that any
"hasMore"/continuation decision is derived from the raw candidateIds length
returned by lookup (the runIds field) and not from the filtered pendingRuns list
produced after checking Postgres state (adjust code paths that compute
hasMore/next-page using pendingRuns to use candidateIds instead: the lookup call
site and subsequent continuation checks). Ensure you still cap processing to
maxCount but preserve the continuation flag based on candidateIds.length >
maxCount.
🧹 Nitpick comments (1)
internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts (1)

74-89: ⚡ Quick win

Instrument the new lookup / lag-retry path.

This new cross-database branch only logs today, so rollout/debugging will be guesswork when runs stay in PENDING_VERSION. A span plus a low-cardinality metric for outcomes like empty | matched | filtered_out and whether a lag retry was scheduled would make this path observable without blowing up cardinality.

As per coding guidelines, internal-packages/run-engine/src/engine/systems/**/*.ts: Integrate OpenTelemetry tracer and meter instrumentation in RunEngine systems for observability, and **/*.ts: ensure OTEL metric attributes have low cardinality.

Also applies to: 113-125, 202-227

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts`
around lines 74 - 89, Wrap the lookupPendingVersionRunIds call in an
OpenTelemetry span and record a low-cardinality meter metric for the outcome
(values: "empty" | "matched" | "filtered_out") plus a boolean attribute
indicating whether a lag retry was scheduled; specifically, start a span before
calling this.$.pendingVersionRunIdLookup.lookupPendingVersionRunIds and set span
attributes for taskIdentifiers/queues size (not full contents), then after
getting candidateIds: emit a metric via a tracer/meter with attributes
outcome="empty" when candidateIds.length === 0 (or "matched" / "filtered_out"
for other branches), and when calling this.#maybeScheduleLagRetry include an
attribute lag_retry_scheduled=true (otherwise false); follow existing OTEL
helper usage in other RunEngine systems, keep attributes low-cardinality
(counts/booleans/enums only), and ensure the span is ended in all code paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts`:
- Around line 76-84: The lookup uses maxCount + 1 as a continuation probe but
the code later computes the "has more" sentinel from the filtered Postgres rows
(pendingRuns), which can drop IDs and mistakenly clear continuation; update the
logic in the pendingVersionSystem flow around
pendingVersionRunIdLookup.lookupPendingVersionRunIds so that any
"hasMore"/continuation decision is derived from the raw candidateIds length
returned by lookup (the runIds field) and not from the filtered pendingRuns list
produced after checking Postgres state (adjust code paths that compute
hasMore/next-page using pendingRuns to use candidateIds instead: the lookup call
site and subsequent continuation checks). Ensure you still cap processing to
maxCount but preserve the continuation flag based on candidateIds.length >
maxCount.

---

Nitpick comments:
In `@internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts`:
- Around line 74-89: Wrap the lookupPendingVersionRunIds call in an
OpenTelemetry span and record a low-cardinality meter metric for the outcome
(values: "empty" | "matched" | "filtered_out") plus a boolean attribute
indicating whether a lag retry was scheduled; specifically, start a span before
calling this.$.pendingVersionRunIdLookup.lookupPendingVersionRunIds and set span
attributes for taskIdentifiers/queues size (not full contents), then after
getting candidateIds: emit a metric via a tracer/meter with attributes
outcome="empty" when candidateIds.length === 0 (or "matched" / "filtered_out"
for other branches), and when calling this.#maybeScheduleLagRetry include an
attribute lag_retry_scheduled=true (otherwise false); follow existing OTEL
helper usage in other RunEngine systems, keep attributes low-cardinality
(counts/booleans/enums only), and ensure the span is ended in all code paths.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 57c8f989-118f-4b2d-a9c1-4ec763d412fb

📥 Commits

Reviewing files that changed from the base of the PR and between 1794758 and 48c2bc4.

📒 Files selected for processing (2)
  • internal-packages/run-engine/src/engine/index.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: e2e-webapp / 🧪 E2E Tests: Webapp
  • GitHub Check: audit
  • GitHub Check: audit
  • GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

  • internal-packages/run-engine/src/engine/index.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

**/*.{ts,tsx,js,jsx}: Prefer static imports over dynamic imports. Only use dynamic import() when circular dependencies cannot be resolved otherwise, code splitting is needed for performance, or the module must be loaded conditionally at runtime.
Import from @trigger.dev/core using subpaths only - never import from the root.
When writing Trigger.dev tasks, always import from @trigger.dev/sdk. Never use @trigger.dev/sdk/v3 or deprecated client.defineJob.
Add agentcrumbs markers (// @Crumbs or `#region `@crumbs) as you write code, not just when debugging. They stay on the branch throughout development and are stripped by agentcrumbs strip before merge.

Files:

  • internal-packages/run-engine/src/engine/index.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • internal-packages/run-engine/src/engine/index.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
**/*.{js,jsx,ts,tsx,json,md,yml,yaml}

📄 CodeRabbit inference engine (AGENTS.md)

Code formatting must be enforced using Prettier before committing

Files:

  • internal-packages/run-engine/src/engine/index.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
internal-packages/run-engine/src/engine/systems/**/*.ts

📄 CodeRabbit inference engine (internal-packages/run-engine/CLAUDE.md)

Integrate OpenTelemetry tracer and meter instrumentation in RunEngine systems for observability

Files:

  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
🧠 Learnings (4)
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

  • internal-packages/run-engine/src/engine/index.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

  • internal-packages/run-engine/src/engine/index.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma error P1001 ("Can't reach database server") in TypeScript, don’t assume a single error shape. Prisma can surface P1001 via two different error classes/fields: `PrismaClientKnownRequestError` exposes it as `err.code === "P1001"` (common during mid-query connection drops), while `PrismaClientInitializationError` exposes it as `err.errorCode === "P1001"` (common on client startup failure). Therefore, predicates should use `err.code === "P1001" || err.errorCode === "P1001"`. Do not flag `err.code === "P1001"` as “unreachable/never matches,” as it is expected in production.

Applied to files:

  • internal-packages/run-engine/src/engine/index.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma errors for P1001 ("Can't reach database server"), do not assume it only appears under a single property name. Prisma may surface P1001 via either `PrismaClientKnownRequestError` (`err.code === "P1001"`, e.g., mid-query connection drops) or `PrismaClientInitializationError` (`err.errorCode === "P1001"`, e.g., client startup connection failure). To reliably detect the condition, check `err.code === "P1001" || err.errorCode === "P1001"`, and avoid review rules that would incorrectly flag `err.code === "P1001"` as unreachable/never-matching.

Applied to files:

  • internal-packages/run-engine/src/engine/index.ts
  • internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts
🔇 Additional comments (2)
internal-packages/run-engine/src/engine/systems/pendingVersionSystem.ts (1)

128-157: LGTM!

internal-packages/run-engine/src/engine/index.ts (1)

246-250: LGTM!

Also applies to: 300-301, 336-341

devin-ai-integration[bot]

This comment was marked as resolved.

ericallam added 2 commits May 22, 2026 16:53
…ndingRuns

After the ClickHouse migration, pendingRuns.length is post-status-guard;
runs that have already left PENDING_VERSION between the CH lookup and
the Postgres refetch get filtered out. Using it as the more-work signal
under-reports when more candidates exist on the worker and stops short.
Switch to candidateIds.length, which is the raw lookup result.
Factory resolution failures (registry misload, missing data store,
ClientType mismatch) are configuration problems, not transient blips,
and ops loses observability if they only surface as warnings. Query-
level errors stay at warn since those are expected to be transient.
@ericallam ericallam merged commit 61ca40b into main May 22, 2026
38 checks passed
@ericallam ericallam deleted the perf/pending-version-clickhouse-lookup branch May 22, 2026 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants