Skip to content

feat(webapp): mollifier trigger-time decisions — mollify, claim, read fallback#3753

Open
d-cs wants to merge 13 commits into
mollifier-phase-3-bufferfrom
mollifier-phase-3-trigger
Open

feat(webapp): mollifier trigger-time decisions — mollify, claim, read fallback#3753
d-cs wants to merge 13 commits into
mollifier-phase-3-bufferfrom
mollifier-phase-3-trigger

Conversation

@d-cs
Copy link
Copy Markdown
Collaborator

@d-cs d-cs commented May 26, 2026

Summary

The trigger hot path's mollifier integration:

  • mollifyTrigger: when the gate trips, write the engine.trigger snapshot to the buffer and return a synthesised QUEUED response. Postgres write is deferred to drainer-replay (next PR in the stack).
  • Pre-gate idempotency-key claim: same-key triggers serialise through Redis so a burst lands in PG / buffer exactly once.
  • Read-fallback extensions: findRunByIdWithMollifierFallback for the trigger-time idempotency lookup that must see buffered runs.
  • Gate bypasses: debounce, oneTimeUseToken, parentTaskRunId/triggerAndWait skip the mollify path entirely.
  • triggerTask + IdempotencyKeyConcern wired to the above.

All behaviour gated by the master TRIGGER_MOLLIFIER_ENABLED switch; off-state hot path is unchanged (the gate is not even consulted).

Stacked on the buffer extensions PR.

Test plan

  • `pnpm run typecheck --filter webapp` passes
  • `pnpm run test --filter webapp test/mollifierMollify.test.ts` passes
  • `pnpm run test --filter webapp test/mollifierIdempotencyClaim.test.ts` passes
  • `pnpm run test --filter webapp test/mollifierReadFallback.test.ts` passes
  • `pnpm run test --filter webapp test/mollifierGate.test.ts` passes
  • `pnpm run test --filter webapp test/engine/triggerTask.test.ts` passes

Ship-gate follow-up fixes

  • Batch items bypass the mollifier gate — fixes BatchTaskRunItem_taskRunId_fkey FK violation on batch triggers when the gate trips. End-state is a drainer-side BatchTaskRunItem create-on-materialise; batch traffic passes through the gate until that lands.
  • IdempotencyKeyConcern honours buffered-run TTL on expiry — buffered path now clears expired idempotency claims (read-side) and resets the buffer's mollifier:idempotency:* SETNX binding (write-side) so a re-trigger past the customer's TTL lands as a fresh run instead of echoing the stale buffered runId.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 26, 2026

⚠️ No Changeset found

Latest commit: 68db676

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 26, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: bbe2875b-02ce-4825-ab22-678795642608

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mollifier-phase-3-trigger

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread apps/webapp/app/v3/mollifier/readFallback.server.ts
@d-cs d-cs self-assigned this May 26, 2026
@d-cs d-cs force-pushed the mollifier-phase-3-trigger branch 2 times, most recently from 5a7bc19 to baa6f17 Compare May 26, 2026 13:24
devin-ai-integration[bot]

This comment was marked as resolved.

@d-cs d-cs force-pushed the mollifier-phase-3-trigger branch from 01f3958 to 449a0bc Compare May 27, 2026 12:04
@d-cs d-cs force-pushed the mollifier-phase-3-buffer branch from e85e771 to 9298706 Compare May 27, 2026 12:15
@d-cs d-cs force-pushed the mollifier-phase-3-trigger branch from 449a0bc to ffe51b8 Compare May 27, 2026 12:15
@d-cs d-cs force-pushed the mollifier-phase-3-buffer branch from 9298706 to f4c5b21 Compare May 27, 2026 12:21
@d-cs d-cs force-pushed the mollifier-phase-3-trigger branch 6 times, most recently from cae33fa to 16bfff0 Compare May 27, 2026 16:50
@d-cs d-cs marked this pull request as ready for review May 28, 2026 09:03
d-cs and others added 9 commits May 28, 2026 10:40
… fallback

The trigger hot path's mollifier integration:
- `mollifyTrigger`: when the gate trips, write the engine.trigger
  snapshot to the buffer and return a synthesised QUEUED response.
- Pre-gate idempotency-key claim: same-key triggers serialise through
  Redis so a burst lands in PG / buffer exactly once.
- Read-fallback extensions: `findRunByIdWithMollifierFallback` for the
  trigger-time idempotency lookup that must see buffered runs.
- Gate bypasses: debounce, oneTimeUseToken, parentTaskRun
  (triggerAndWait) skip the mollify path entirely.
- triggerTask + IdempotencyKeyConcern wired to the above.

Stacked on buffer extensions PR. All behaviour gated by the master
`TRIGGER_MOLLIFIER_ENABLED` switch; off-state hot path is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`new Date("not-a-date")` returns a truthy Invalid Date object, which would
mis-classify the run as CANCELED in the read-fallback synthesised shape.
Add an `asDate` helper that rejects NaN-valued parses and use it for
`cancelledAt` and `delayUntil`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The buffer's claim API now requires a caller-supplied ownership token
so compare-and-act protects the slot against a stale predecessor.
Wires the token end-to-end:

- `claimOrAwait` generates the token (UUID) up front and reuses it
  across the retry path; returns it on the `claimed` outcome.
- `publishClaim` and `releaseClaim` wrappers accept and forward the
  token to the buffer.
- `ClaimedIdempotency` carries the token so the trigger pipeline can
  publish or release with the same token it claimed under.
- `triggerTask.server.ts` threads the token into the publish call.

Tests pin token round-trip: claimOrAwait → claimIdempotency, plus the
publish and release pass-through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t leak fix)

Devin review flagged a security + correctness bug: the mollify path
returned \`MollifySyntheticResult\` with run shape
\`{ friendlyId, spanId }\` and the call site cast it to
\`TriggerTaskServiceResult\` (which expects \`run: TaskRun\` with an
\`id: string\`). Downstream, the trigger route calls
\`saveRequestIdempotency(requestKey, "trigger", result.run.id)\` — so
the cache stored \`undefined\` as the entity id. On SDK retry the
request-idempotency flow then ran
\`prisma.taskRun.findFirst({ where: { id: undefined } })\`. Prisma
strips \`undefined\` from where clauses, so the query degenerates to an
unfiltered \`findFirst\` and returns an arbitrary TaskRun row —
potentially from another env / user.

Fix:
- Add \`id: string\` to \`MollifySyntheticResult.run\`.
- Compute it via \`RunId.fromFriendlyId(...)\` on both branches:
  the happy-accept path (id from \`args.runFriendlyId\`) and the
  \`duplicate_idempotency\` race-loser path (id from
  \`result.existingRunId\` so the response carries the WINNER's id).
- Add regression tests pinning both branches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…uard

- readFallback: read snapshot.taskVersion (the key buildEngineTriggerInput writes) instead of the nonexistent snapshot.lockToVersion, so buffered version-locked runs report their locked version; test now uses the real key as a regression guard.

- env: TRIGGER_MOLLIFIER_TRIP_THRESHOLD back to positive() (matching sibling mollifier numerics) to forbid threshold=0 silently mollifying every trigger.

- idempotencyKeys: document why the resolved-but-unfindable fall-through is safe (PG-unique + accept SETNX dedup + ~30s claim TTL self-heal); add regression test pinning the fall-through and the resolved-and-findable cached-hit path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove plan-tracking shorthand (Q#, C#, F#, Phase N, _plans/) from trigger-layer mollifier comments and test names; reword to plain English. Comment/test-name only; no behaviour change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…f-state token alloc

triggerTask.server.ts: mollifier.buffered logs at debug, not info — it fires per mollified trigger during a burst, exactly when info-level would flood aggregation.

idempotencyClaim.server.ts: move the buffer-null fall-open check above token generation so the mollifier-OFF path skips the default randomUUID() for idempotency-keyed triggers (triggerTask is the highest-throughput path). Injected test generators still honoured.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nt, docs URL

idempotencyKeys.server.ts: reword 'bypasses the gate via F4' to 'bypasses the gate entirely' — last residual internal planning-label reference I missed in the earlier cleanup pass.

triggerTask.server.ts: reindent the startSpan arrow-function body to conventional depth (+4 sp on lines 143-619). The body was under-indented relative to the 'async (span) => {' opener after the try-wrapper for claim resolution was added around it; closes (arrow at 8sp, startSpan at 6sp) were already conventional, so only the body needed shifting. Pure whitespace — diff stat is balanced 413/413.

mollifierMollify.server.ts: docs URL → https://trigger.dev/docs/management/tasks/batch-trigger. The previous burst-handling anchor doesn't exist on the docs site.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@d-cs d-cs force-pushed the mollifier-phase-3-trigger branch from f126737 to 36cc024 Compare May 28, 2026 09:41
devin-ai-integration[bot]

This comment was marked as resolved.

Addresses Devin's review on PR #3753 (idempotencyKeys.server.ts comment 2026-05-28): when TRIGGER_MOLLIFIER_ENABLED=1 globally during staged rollout, claimOrAwait was issuing a Redis SETNX for every idempotency-keyed trigger — including orgs that had not opted in. Those orgs gain nothing from the claim (the gate always returns pass_through for them, so the buffer is never written to) and PG unique constraint already deduplicates same-key races on the pass-through path.

Wrap the claim block in an additional per-org check using the same in-memory Organization.featureFlags predicate that evaluateGate uses (makeResolveMollifierFlag) — no DB query. Non-opted-in orgs now skip claimOrAwait entirely; opted-in orgs still get the cross-store race coordination.

Regression test added in mollifierClaimResolution.test.ts: with orgFlag=false the concern returns isCached:false with no claim, without calling claimIdempotency at all. The two prior tests now opt the fake org in (organization.featureFlags.mollifierEnabled=true) so they still exercise the resolution branches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

Addresses two Devin findings on PR #3753 (readFallback.server.ts:206 comments 2026-05-28).

1) batchId field-name mismatch: #buildEngineTriggerInput in triggerTask.server.ts writes batch info as a nested object 'batch: { id, index }', not as a flat 'batchId'. readFallback was reading the non-existent snapshot.batchId so SyntheticRun.batchId was always undefined for buffered runs. Now reads snapshot.batch.id (the internal cuid, matching what PG stores in TaskRun.batchId).

2) parentTaskRunFriendlyId / rootTaskRunFriendlyId structurally unfillable: the snapshot carries the INTERNAL parent/root ids (parentTaskRunId / rootTaskRunId, what engine.trigger consumes), not friendlyIds. Convert internal to friendly via RunId.toFriendlyId so consumers do not have to special-case the buffered path.

Four regression tests in mollifierReadFallback.test.ts: batchId extracted from nested snapshot.batch.id; a flat snapshot.batchId key is ignored (belt-and-braces); parent/root friendly conversion round-trips through RunId.generate(); parent/root undefined when the snapshot has no parent context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs and others added 2 commits May 29, 2026 13:06
…tency expiry

PG-resident path enforces `idempotencyKeyExpiresAt`: when an existing
PG row is found, the lookup compares its `expiresAt` against now,
clears the key on expiry, and lets a new run go through. The buffered
path was missing this — `findBufferedRunWithIdempotency` returned any
buffered run whose snapshot carried the key, regardless of how long
ago the customer's TTL had elapsed. `idempotencyKeyTTL: "2s"` plus
a second trigger 4s later would return the original buffered runId.

Two layers needed the fix:

1. **Read-side (this concern).** Surface
   `idempotencyKeyExpiresAt` on `SyntheticRun` (the snapshot already
   stores it; `findRunByIdWithMollifierFallback` just wasn't exposing
   it). In `findBufferedRunWithIdempotency`, apply the same
   `expiresAt < new Date()` check as the PG path and return null on
   expiry.

2. **Write-side (the buffer's accept dedupe).** Returning null from
   step 1 isn't enough: the trigger pipeline then proceeds to
   `mollifyTrigger`, whose `buffer.accept` Lua dedupes by
   `(envId, taskIdentifier, idempotencyKey)` via SETNX on the same
   `mollifier:idempotency:*` key and would still echo the stale runId
   as `duplicate_idempotency`. On expiry, clear the buffer-side
   idempotency binding via `buffer.resetIdempotency` — the same
   primitive `ResetIdempotencyKeyService` uses for the explicit
   reset-via-API path. The next accept then goes through as a fresh
   trigger.

Verified end-to-end: with mollifier active and
`idempotencyKeyTTL: "2s"`, a same-key retrigger after 4s returns a
new runId; same-key retriggers within the TTL still dedupe to the
original.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Batch triggers crash with HTTP 500 "Foreign key constraint violated on
the constraint BatchTaskRunItem_taskRunId_fkey" whenever the mollifier
gate is active. The crash chain:

1. batchTriggerV3 generates a run id per item, calls
   `triggerTaskService.call` with `options.batchId` set.
2. The mollifier gate trips, the item is buffered, the mollify response
   returns a stripped run shape `{ id, friendlyId, spanId }` with no
   PG row.
3. batchTriggerV3 then tries `prisma.batchTaskRunItem.create({
   taskRunId: result.run.id, ... })` — the FK to TaskRun.id fails
   because no row was written.

The proper fix requires both ends: skip the join-row create at
trigger-time AND create it on the drainer-side materialise. The
drainer fix is non-trivial (the drainer / replay PR territory, not
the dashboard PR) and the snapshot already carries
`batch: { id, index }` so it has the info — but until that's wired
through to a `BatchTaskRunItem.create` call on materialise, skipping
trigger-time would silently lose the batch <-> run link forever and
break batch progress reporting + `batchTriggerAndWait` parent
resumption.

Short-circuit the gate when `options.batchId` is set so batch items
always go straight to PG. Batch triggers lose the burst-protection
benefit of the mollifier; single triggers and triggerAndWait are
unaffected. Removing this bypass is the natural follow-up once the
drainer-side BatchTaskRunItem path lands in the appropriate earlier PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@d-cs
Copy link
Copy Markdown
Collaborator Author

d-cs commented May 29, 2026

Pushed 2 follow-up fixes:

  • Batch items bypass the mollifier gate. Without this, batch triggers crash with BatchTaskRunItem_taskRunId_fkey FK violations as soon as the gate trips. The mollify path returns a stripped run-shape with no PG row, but BatchTaskRunItem.taskRunId is a NOT NULL FK to TaskRun.id and the join row is created at trigger time. End-state is a drainer-side BatchTaskRunItem create-on-materialise (snapshot already carries batch: { id, index }), but until that lands, batch triggers pass through — they lose the burst-protection benefit, but the path is end-to-end correct.
  • IdempotencyKeyConcern honours buffered-run TTL on expiry. The PG-resident path enforced `idempotencyKeyExpiresAt` by clearing the key on expiry; the buffered path was missing this and would return the original buffered runId indefinitely (`idempotencyKeyTTL: "2s"` plus a second trigger 4s later still got the cached run). Fix touches both read-side (surface `idempotencyKeyExpiresAt` on `SyntheticRun`, apply `< new Date()` check in `findBufferedRunWithIdempotency`) and write-side (clear the buffer's idempotency lookup via `buffer.resetIdempotency` on expiry so the next accept goes through as a fresh trigger).

Verified end-to-end against a buffered run; webapp typecheck clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants