[backport] [world-vercel] Use stream control frame for transparent reconnection#1766
[backport] [world-vercel] Use stream control frame for transparent reconnection#1766VaguelySerious merged 5 commits intostablefrom
Conversation
Add hold-back buffer to readFromStream that detects the 13-byte stream control frame appended by workflow-server on timeout. When the server signals a timeout (done=false), the client transparently reconnects from the next chunk index. Caps reconnections at 50 (~100 min). Network errors propagate to consumers via controller.error() instead of silently closing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: d78a4d4 The changes in this PR will be included in the next version bump. This PR includes changesets to release 18 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests▲ Vercel Production (12 failed)astro (1 failed):
fastify (1 failed):
hono (1 failed):
nextjs-turbopack (3 failed):
nextjs-webpack (3 failed):
sveltekit (1 failed):
vite (2 failed):
🌍 Community Worlds (68 failed)mongodb-dev (1 failed):
redis-dev (1 failed):
turso-dev (1 failed):
turso (65 failed):
Details by Category❌ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
❌ 🌍 Community Worlds
✅ 📋 Other
❌ Some E2E test jobs failed:
Check the workflow run for details. |
TooTallNate
left a comment
There was a problem hiding this comment.
The code itself is a clean backport of the reconnection logic from #1742, but there's a fundamental protocol mismatch that makes it non-functional.
Re: why not use the backport Action: The repo has a Backport to stable workflow (.github/workflows/backport.yml) that triggers on merged PRs with the backport-stable label. PR #1742 has no labels and hasn't been merged yet, so the action was never triggered. Once #1742 is merged with the backport-stable label, the action would cherry-pick it automatically (or use OpenCode to resolve conflicts if the cherry-pick fails). That said, cherry-picking #1742 would have the same protocol mismatch described below, so a manual backport with adaptation is the right call — it just needs a different approach to signal control frame support.
Changes reviewed:
parseStreamControlFrame,concatUint8Arrays,STREAM_CONTROL_FRAME_SIZE— identical to main, correct.- Hold-back buffer + reconnection loop in
readFromStream— identical logic to main'sstreams.get, withMAX_RECONNECTS=50andcontroller.error(err)on network errors. All good. - Unit tests for
parseStreamControlFrame— identical to main, 8 tests. Good. - Integration tests for reconnection — 3 tests (reconnect, backward compat, error propagation). Good coverage.
- Changeset — correct.
| }); | ||
| if (!response.ok) { | ||
| throw new Error(`Failed to fetch stream: ${response.status}`); | ||
| } |
There was a problem hiding this comment.
Blocking: ?controlFrame=1 is not supported by workflow-server. The server decides whether to emit the control frame based solely on the API version from the URL path:
// workflow-server/lib/handlers/stream.ts:420
const enableControlFrame = getApiVersion(req) >= 3;getApiVersion() is set by the v3 Hono sub-app middleware when the request path starts with /api/v3/. A query parameter has no effect.
On main, PR #1742 solved this by switching getStreamUrl to use /v3/runs/:runId/stream/:name. But on stable, readFromStream uses the deprecated route /v2/stream/:name (no runId), and the v3 API doesn't register that deprecated route — only /v3/runs/:runId/stream/:streamId exists.
So the control frame will never be emitted for this code path, and the reconnection logic is dead code.
Options to fix:
- If
readFromStreamonstablehas access to arunIdsomewhere (even via a different caller), thread it through and use/v3/runs/:runId/stream/:name - Add server-side support for
?controlFrame=1on the v2 deprecated stream route (a small change ingetStreamHandler) - Register the deprecated
/stream/:streamIdroute on the v3 sub-app inapp.ts(one line)
There was a problem hiding this comment.
Fixed in 8781b5f — switched to /v3/stream/:name (option 3). Companion server PR: vercel/workflow-server#387 registers the deprecated stream route on the v3 sub-app.
|
I suppose you did a "manual" backport because the World API is different on |
The server enables control frames based on API version (v3+), not a query param. Use the v3 deprecated stream route which will be registered on the server via vercel/workflow-server#387. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: vercel[bot] <35613825+vercel[bot]@users.noreply.github.com> Signed-off-by: Peter Wielander <mittgfu@gmail.com>
… with streaming hold-back buffer Re-applies #1742's reconnect feature on stable, without the arrayBuffer() rewrite that shipped in 4.2.3 (#1766) and defeated incremental streaming. - readFromStream returns the ReadableStream promptly; a hold-back buffer inside the pull loop holds only the last 13 bytes, forwarding surplus bytes as they arrive. This preserves incremental delivery for AI UIs. - On upstream close, the tail is parsed for a stream control frame (done=true → close; done=false → reconnect from nextIndex; no frame → forward tail as data for older servers). - Network errors propagate via controller.error() rather than silent close. - Caps reconnections at 50 (~100 min of streaming at 2-min server timeout). Adds a regression test that asserts data reaches the consumer before upstream close — any buffer-then-replay implementation fails it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… with streaming hold-back buffer Re-applies #1742's reconnect feature on stable, without the arrayBuffer() rewrite that shipped in 4.2.3 (#1766) and defeated incremental streaming. - readFromStream returns the ReadableStream promptly; a hold-back buffer inside the pull loop holds only the last 13 bytes, forwarding surplus bytes as they arrive. This preserves incremental delivery for AI UIs. - On upstream close, the tail is parsed for a stream control frame (done=true → close; done=false → reconnect from nextIndex; no frame → forward tail as data for older servers). - Network errors propagate via controller.error() rather than silent close. - Caps reconnections at 50 (~100 min of streaming at 2-min server timeout). Adds a regression test that asserts data reaches the consumer before upstream close — any buffer-then-replay implementation fails it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Peter Wielander <mittgfu@gmail.com>
Summary
Backport of #1742 to
stable.readFromStreamthat detects the 13-byte stream control frame appended by workflow-server on timeoutdone=false), the client transparently reconnects from the next chunk index using?controlFrame=1controller.error()instead of silently closingTest plan
parseStreamControlFrame(8 tests)🤖 Generated with Claude Code