[world-vercel] Use stream close control frame to decide whether to reconnect to stream#1742
[world-vercel] Use stream close control frame to decide whether to reconnect to stream#1742VaguelySerious wants to merge 12 commits intomainfrom
Conversation
# Conflicts: # packages/world-vercel/src/streamer.test.ts # packages/world-vercel/src/streamer.ts
🦋 Changeset detectedLatest commit: 552c2eb The changes in this PR will be included in the next version bump. This PR includes changesets to release 18 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests🌍 Community Worlds (74 failed)mongodb (7 failed):
redis (7 failed):
turso (60 failed):
Details by Category✅ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
❌ 🌍 Community Worlds
✅ 📋 Other
|
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) workflow with 10 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) workflow with 25 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) workflow with 50 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) workflow with 10 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) workflow with 25 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) workflow with 50 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) stream pipeline with 5 transform steps (1MB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) 10 parallel streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) fan-out fan-in 10 streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
|
- Fix streamer-reconnect.test.ts: use streams.get(runId, name) instead of non-existent readFromStream method - Update stream GET URL from v2 to v3 to receive control frames - Remove unused readFromStream from Streamer interface - Reset WORKFLOW_SERVER_URL_OVERRIDE to empty string Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Peter Wielander <mittgfu@gmail.com>
| */ | ||
| const WORKFLOW_SERVER_URL_OVERRIDE = ''; | ||
| const WORKFLOW_SERVER_URL_OVERRIDE = | ||
| 'https://workflow-server-git-peter-stream-control-frame.vercel.sh'; |
There was a problem hiding this comment.
Don't forget to remove this before merging
TooTallNate
left a comment
There was a problem hiding this comment.
The reconnection design is well thought out. The hold-back buffer approach for control frame detection is correct and handles all edge cases I traced through (split frames, coalesced chunks, empty streams, short data). The test coverage is thorough. But there's one blocking issue and a couple of concerns.
Cross-reference with server-side PR 367: I reviewed the server PR and left a REQUEST_CHANGES there too. The server currently only emits the control frame on timeout (enableControlFrame && maxDurationReached), NOT on natural stream completion. This client correctly handles that via the "no control frame" fallback path — so the two PRs are compatible as-is. However, the server PR has 6 failing integration tests due to a test/code mismatch introduced in its last commits. This client PR should not be merged until the server PR is resolved and deployed.
Changes reviewed:
parseStreamControlFrame(streamer.ts): Correct mirror of server's format. Checks zero-marker + magic footer, parses flags and nextIndex.- Hold-back buffer (
streamer.ts:278-348): Retains last 13 bytes during streaming to detect the control frame at close. Handles split frames, coalesced chunks, and backward compat (no frame) gracefully. - Reconnection loop (
streamer.ts:335-345): Ondone=false, reconnects fromcontrol.nextIndex. Clean loop. - Error handling (
streamer.ts:289-296): Network errors flush the buffer and close without retrying. Correct. - Unit tests (
streamer.test.ts): Good coverage ofparseStreamControlFrameedge cases. - Integration tests (
streamer-reconnect.test.ts): Comprehensive — covers single reconnect, multiple reconnects, split frames, coalesced frames, empty streams, backwards compat, network errors, and non-200 responses.
| */ | ||
| const WORKFLOW_SERVER_URL_OVERRIDE = ''; | ||
| const WORKFLOW_SERVER_URL_OVERRIDE = | ||
| 'https://workflow-server-git-peter-stream-control-frame.vercel.sh'; |
There was a problem hiding this comment.
Blocking: This hardcodes WORKFLOW_SERVER_URL_OVERRIDE to a preview deployment URL. This must be reverted to '' before merging — it would route all production traffic to a preview branch deployment.
(I left a comment about this earlier, flagging again as blocking since it's still present.)
| controller.enqueue(tailBuffer); | ||
| tailBuffer = new Uint8Array(0); | ||
| } | ||
| controller.close(); |
There was a problem hiding this comment.
Non-blocking concern: On network error, the code swallows the error and calls controller.close() instead of controller.error(err). This means consumers of the returned ReadableStream see a clean close rather than an error, and have no way to know the stream was truncated.
The reconnect test asserts the partial data is received and only 1 fetch is made, which is correct for "no retry on errors." But silently closing on error could be a footgun — the consumer (the workflow VM) would think it received the complete stream.
Consider either:
- Propagating the error via
controller.error(err)so consumers can handle it - Or at minimum, logging a warning
There was a problem hiding this comment.
Fixed in 6ee6e7d — network errors now propagate via controller.error(err) instead of silently closing.
| } | ||
| const response = await fetch(url, { | ||
| headers: httpConfig.headers, | ||
| let currentStartIndex = startIndex ?? 0; |
There was a problem hiding this comment.
Non-blocking: There's no upper bound on reconnection attempts. If the server keeps returning done=false (e.g., due to a bug where the stream never completes), this will loop indefinitely. Consider adding a maxReconnects counter (e.g., 50 — enough for ~100 minutes of streaming at 2-minute timeouts) with a warning log when exhausted.
There was a problem hiding this comment.
Fixed in 6ee6e7d — added MAX_RECONNECTS = 50 cap (~100 min at 2-min server timeout). Errors with a descriptive message when exhausted.
Signed-off-by: Peter Wielander <mittgfu@gmail.com>
- Propagate network errors via controller.error() instead of silently closing, so consumers know the stream was truncated - Add MAX_RECONNECTS (50) cap to prevent infinite loops if the server never completes the stream Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The override was accidentally reset to empty in f92a411, causing e2e tests to hit production workflow-server which doesn't have the v3 stream endpoint yet. Restore to the preview deploy URL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 80c756f.
There was a problem hiding this comment.
All feedback addressed. LGTM.
What changed since my last review:
WORKFLOW_SERVER_URL_OVERRIDE— Reverted.utils.tsis no longer in the diff.- Error propagation —
controller.close()changed tocontroller.error(err)on network errors. Test updated to assertrejects.toThrow('connection reset'). - Reconnection cap — Added
MAX_RECONNECTS = 50(~100 min at 2-min timeouts). Errors with a clear message when exhausted.
All CI checks pass (unit tests on both ubuntu and windows, build, E2E).

No description provided.