Streaming pipeline chunk-emitting by aram356 · Pull Request #562 · IABTechLab/trusted-server

aram356 · 2026-03-25T15:42:13Z

Summary

Design spec for streaming HTTP responses through the publisher proxy when Next.js is disabled
Targets both TTFB/TTLB improvement and peak memory reduction (from ~4x response size to constant)
Two implementation steps: (1) make pipeline chunk-emitting, (2) wire up Fastly StreamingBody
Streaming gate: only activates when html_post_processors() is empty (Next.js disabled)
Verified against Fastly SDK 0.11.12 API (stream_to_client, StreamingBody, Request::from_client)

Stacked on #560 (docs reorganization).
Closes #565

Test plan

Spec reviewed for technical accuracy against codebase
Fastly SDK assumptions verified (no fastly::init, StreamingBody implements Write, abort-on-drop)
All compression paths covered (gzip, deflate, brotli — both keep and strip)
Error handling covers pre-stream and mid-stream failure modes

Move spec-like documents from root and docs/internal into a structured layout under docs/superpowers with timestamped filenames: - specs/: active design documents (SSC PRD, technical spec, EdgeZero migration, auction orchestration flow, production readiness report) - archive/: completed or historical specs (optimization, sequence, publisher IDs audit)

Design spec for streaming HTTP responses through the publisher proxy when Next.js is disabled. Covers two implementation steps: 1. Make the streaming pipeline chunk-emitting (HtmlRewriterAdapter, gzip, encoder finalization) 2. Wire up Fastly StreamingBody via stream_to_client() with entry point migration from #[fastly::main] to undecorated main() Includes streaming gate logic, error handling, rollback strategy, and testing plan. Verified against Fastly SDK 0.11.12 API.

Add before/after measurement protocol using Chrome DevTools MCP tools: network timing capture, Lighthouse audits, performance traces, memory snapshots, and automated body hash comparison for correctness.

…er, rollback

…reaming refactor

…ssing paths

…ation test - Add debug-level logging to process_chunks showing total bytes read and written per pipeline invocation - Add brotli-to-brotli round-trip test to cover the into_inner() finalization path - Add test proving HtmlWithPostProcessing accumulates output when post-processors are registered while streaming path passes through

- Group std imports together (cell, io, rc) before external crates - Document supported compression combinations on PipelineConfig - Remove dead-weight byte counters from process_chunks hot loop - Fix stale comment referencing removed process_through_compression - Fix brotli finalization: use drop(encoder) instead of into_inner() to make the intent clear (CompressorWriter writes trailer on drop) - Document reset() as no-op on HtmlRewriterAdapter (single-use) - Add brotli round-trip test covering into_inner finalization path - Add gzip HTML rewriter pipeline test (compressed round-trip with lol_html, not just StreamingReplacer) - Add HtmlWithPostProcessing accumulation vs streaming behavior test

- Add Eq derive to Compression enum (all unit variants, trivially correct) - Brotli finalization: use into_inner() instead of drop() to skip redundant flush and make finalization explicit - Document process_chunks flush semantics: callers must still call encoder-specific finalization after this method returns - Warn when HtmlRewriterAdapter receives data after finalization (rewriter already consumed, data would be silently lost) - Make HtmlWithPostProcessing::reset() a true no-op matching its doc (clearing auxiliary state without resetting rewriter is inconsistent) - Document extra copying overhead on post-processor path vs streaming - Assert output content in reset-then-finalize test (was discarded) - Relax per-chunk emission test to not depend on lol_html internal buffering behavior — assert concatenated correctness + at least one intermediate chunk emitted

…ation' into feature/streaming-pipeline-phase2

Accumulate text fragments via Mutex<String> until last_in_text_node is true, then process the complete text. Intermediate fragments return RemoveNode to suppress output.

Accumulate text fragments via Mutex<String> until last_in_text_node is true, then match and rewrite on the complete text. Non-GTM scripts that were fragmented are emitted unchanged.

All script rewriters (NextJS __NEXT_DATA__, GTM) are now fragment-safe — they accumulate text internally until last_in_text_node. The buffered adapter workaround is no longer needed. Always use streaming mode in create_html_processor.

When rewrite_structured returns Keep on accumulated content, intermediate fragments were already removed via RemoveNode. Emit the full accumulated content via Replace to prevent silent data loss. Also updates spec to reflect Phase 3 completion.

- Add response.get_status().is_success() check to streaming gate so 4xx/5xx error pages stay buffered with complete status codes - Add streaming gate unit tests covering all gate conditions - Add stream_publisher_body gzip round-trip test - Add small-chunk (32 byte) pipeline tests for __NEXT_DATA__ and GTM that prove fragmented text nodes survive the real lol_html path

Phase 3 performance results: 35% TTFB improvement, 37% DOM Complete improvement on getpurpose.ai staging vs production. Phase 4 adds binary pass-through streaming via PublisherResponse::PassThrough.

- Extract streaming gate into can_stream_response() function so tests call production code instead of reimplementing the formula - Refactor GTM rewrite() to use Option<String> pattern instead of uninit variable, replacing indirect text.len() != content.len() accumulation check with explicit full_content.is_some() - Add cross-element safety doc comment on accumulated_text fields in GTM and NextJsNextDataRewriter - Document RSC placeholder deliberate non-accumulation strategy - Update spec to reflect script rewriters are now fragment-safe

- Document why Mutex<String> is used (Sync bound on trait, not concurrent access) in both NextJsNextDataRewriter and GoogleTagManagerIntegration - Add accumulation_buffer_drains_between_consecutive_script_elements test proving the buffer doesn't leak between two sequential <script> elements (fragmented GTM followed by fragmented non-GTM)

Non-processable 2xx responses (images, fonts, video) now stream directly to the client via PublisherResponse::PassThrough instead of buffering the entire body in memory. Content-Length is preserved since the body is unmodified.

Tests verify non-processable 2xx responses return PassThrough, non-processable errors stay Buffered, and processable content goes through Stream (not PassThrough).

Adds pass_through_preserves_body_and_content_length test that verifies io::copy produces identical output and Content-Length is preserved. Updates handle_publisher_request doc to describe all three response variants.

- Exclude 204 No Content from PassThrough (must not have body) - Remove Content-Length before streaming (stream_to_client uses chunked encoding, keeping both violates HTTP spec) - Add tests for 204 exclusion and empty-host interaction - Update doc comment and byte-level test to reflect CL removal

PassThrough reattaches the unmodified body and uses send_to_client() instead of stream_to_client() + io::copy. This preserves Content-Length (avoids chunked encoding overhead for images/fonts) and lets Fastly stream from its internal buffer without WASM memory buffering.

- Fix PassThrough doc comment operation order (set_body before finalize) - Update function doc to describe actual PassThrough flow (set_body + send_to_client, not io::copy) - Remove dead _enters_early_return variable, replace with comment

@prk-Jr

When the origin returns a processable 2xx response with an encoding the pipeline cannot decompress (e.g. `zstd` from a misbehaving origin), the buffered fallback previously still routed the body through process_response_streaming. `Compression::from_content_encoding` maps unknown values to `None`, so the rewriter would treat the compressed bytes as identity-encoded text and emit garbled output. Bypass the rewrite pipeline entirely in that case and return the origin response untouched. Adds a test asserting byte-for-byte pass-through and updates the is_supported_content_encoding doc to reflect the new behavior. Addresses PR #585 review feedback from @prk-Jr.

…eaming-pipeline-phase2 # Conflicts: # crates/trusted-server-core/src/publisher.rs

- Clarify PassThrough variant doc: finalize_response() and send_to_client() are applied at the outer dispatch level, not in this arm - Hoist status outside the early-return block and reuse is_success to eliminate the duplicate get_status() call - Exclude 205 Reset Content alongside 204 No Content per RFC 9110 §15.3.6; add pass_through_gate_excludes_205_reset_content test - Log binary pass-through before returning to aid production tracing

ChristianPavilonis · 2026-04-17T18:20:15Z

+    /// No-op. `HtmlRewriterAdapter` is single-use: the rewriter consumes its
+    /// [`Settings`](lol_html::Settings) on construction and cannot be recreated.
+    /// Calling [`process_chunk`](StreamProcessor::process_chunk) after
+    /// [`process_chunk`](StreamProcessor::process_chunk) with `is_last = true`


The doc comment has a copy-paste error on the second intra-doc link:

/// Calling [`process_chunk`](StreamProcessor::process_chunk) after /// [`process_chunk`](StreamProcessor::process_chunk) with `is_last = true`

The second occurrence should describe what happens before the call, not repeat the same method name. Something like:

/// Calling [`process_chunk`](StreamProcessor::process_chunk) after /// [`reset`](StreamProcessor::reset) or after `is_last = true` will produce empty output.

Or simpler:

/// Calling [`process_chunk`](StreamProcessor::process_chunk) after finalization /// (`is_last = true`) will produce empty output — the rewriter is already done.

ChristianPavilonis · 2026-04-17T18:20:25Z

+        assert!(
+            encoding_supported && (!is_html || !has_post_processors),
+            "should stream 2xx HTML without post-processors"
+        );


These gate tests validate boolean expressions over local variables, not the actual can_stream logic in handle_publisher_request. If the gate formula in the function changes (e.g. a condition is added or the boolean is restructured), these tests will still pass.

Consider replacing the boolean-expression tests with an integration-level test that invokes handle_publisher_request with a real response and asserts which variant of PublisherResponse is returned. That way the tests actually exercise the gate, not just a re-statement of its formula.

The tests for 204/205 exclusion in the PassThrough gate are similarly speculative — they replay the conditional logic rather than asserting on the function's observable output.

ChristianPavilonis · 2026-04-17T18:20:34Z

-            .unwrap_or_default()
-            .to_lowercase();
+    if !should_process || request_host.is_empty() || !is_success {
+        log::debug!(


When should_process is true and request_host.is_empty() is true, the code enters this block and falls through to Buffered(response) at line 503 — silently returning the origin's processable content (HTML, JS, CSS) unmodified without URL rewriting. This was the same behavior before, but a log::warn! would make misconfiguration (missing Host header) visible in logs. The current log::debug! means a mis-proxied page with unrewritten URLs would be invisible at production log levels.

ChristianPavilonis

Second pass review. Previous comments were addressed — has_html_post_processors() is implemented.

Previous comments resolved: Both prior inline comments are addressed — the has_html_post_processors() optimization is present in registry.rs and the streaming plan doc was updated.

New findings:

Doc comment copy-paste in HtmlRewriterAdapter::reset() (commented inline). Minor, but renders incorrect in rustdoc.
Gate tests validate boolean expressions, not function behavior (commented inline). The streaming_gate_* and pass_through_gate_* tests reproduce the conditional logic from handle_publisher_request as local variables and assert on those variables. They pass even if the gate in the actual function is changed. The unsupported_encoding_response_is_returned_unmodified test (which uses a real Response) is the right model — the gate tests should follow that pattern using handle_publisher_request returning a PublisherResponse variant.
Missing WARN log when processable content is skipped due to empty request host (commented inline). The combined condition !should_process || request_host.is_empty() || !is_success logs at DEBUG for all three cases. An empty request_host with processable content is a misconfiguration that silently returns unrewritten HTML/JS to the client. Worth a separate warn branch.

No blocking issues. The core streaming architecture — chunk-emitting HtmlRewriterAdapter, unified process_chunks loop, PublisherResponse enum, and the pass-through/stream/buffered routing in main.rs — is sound. The Mutex accumulation buffers in GTM and NextJS rewriters are correctly drained on the final fragment path. The brotli encoder finalization via flush() inside process_chunks followed by into_inner() is correct. Content-Length handling is consistent across all three response paths.

ChristianPavilonis

Findings from second pass are non-blocking — approving.

aram356 added 3 commits March 25, 2026 08:20

Enable superpowers and chrome-devtools plugins

deffcec

aram356 mentioned this pull request Mar 25, 2026

Implement streaming response optimization for non-Next.js publisher proxy #563

Open

17 tasks

aram356 self-assigned this Mar 25, 2026

aram356 linked an issue Mar 25, 2026 that may be closed by this pull request

Implement streaming response optimization for non-Next.js publisher proxy #563

Open

17 tasks

aram356 requested review from ChristianPavilonis, jevansnyc and prk-Jr March 25, 2026 15:44

aram356 added 3 commits March 25, 2026 10:57

Expand testing strategy with Chrome DevTools MCP performance measurement

de8dbfd

Add before/after measurement protocol using Chrome DevTools MCP tools: network timing capture, Lighthouse audits, performance traces, memory snapshots, and automated body hash comparison for correctness.

Clarify flush vs drop behavior in process_through_compression

fa74167

Add implementation plan for streaming response optimization

221d971

Base automatically changed from docs/organize-specs-and-archive to main March 26, 2026 15:57

aram356 added 17 commits March 26, 2026 08:58

Merge branch 'main' into specs/streaming-response-optimization

42b7b07

Fix encoder finalization: explicit finish instead of drop

6968201

Convert process_gzip_to_gzip to chunk-based processing

a4fd5c6

Convert decompress_and_process to chunk-based processing

a4f4a7c

Rewrite HtmlRewriterAdapter for incremental lol_html streaming

105244c

Unify compression paths into single process_chunks method

d72669c

Update plan with compression refactor implementation note

80e51d4

Accumulate output for post-processors in HtmlWithPostProcessing

c505c00

Add streaming response optimization spec for non-Next.js paths

6cae7f9

Address spec review: Content-Length, streaming gate, finalization ord…

930a584

…er, rollback

Address deep review: header timing, error phases, process_response_st…

a2b71bf

…reaming refactor

Address deep review: remove fastly::init, fix API assumptions, add mi…

b363e56

…ssing paths

Merge branch 'main' into feature/streaming-pipeline-phase1

13366f8

Apply rustfmt formatting to streaming_processor

b83f61c

aram356 added 22 commits April 8, 2026 11:47

Merge remote-tracking branch 'origin/specs/streaming-response-optimiz…

c6e787a

…ation' into feature/streaming-pipeline-phase2

Make NextJsNextDataRewriter fragment-safe for streaming

6e6ac7c

Accumulate text fragments via Mutex<String> until last_in_text_node is true, then process the complete text. Intermediate fragments return RemoveNode to suppress output.

Make GoogleTagManagerIntegration rewrite fragment-safe for streaming

2fb546f

Accumulate text fragments via Mutex<String> until last_in_text_node is true, then match and rewrite on the complete text. Non-GTM scripts that were fragmented are emitted unchanged.

Remove buffered mode from HtmlRewriterAdapter

41c6bb3

All script rewriters (NextJS __NEXT_DATA__, GTM) are now fragment-safe — they accumulate text internally until last_in_text_node. The buffered adapter workaround is no longer needed. Always use streaming mode in create_html_processor.

Add Phase 3 results and Phase 4 plan to spec and plan documents

dd2f82e

Phase 3 performance results: 35% TTFB improvement, 37% DOM Complete improvement on getpurpose.ai staging vs production. Phase 4 adds binary pass-through streaming via PublisherResponse::PassThrough.

Stream binary pass-through responses via io::copy

43e035c

Non-processable 2xx responses (images, fonts, video) now stream directly to the client via PublisherResponse::PassThrough instead of buffering the entire body in memory. Content-Length is preserved since the body is unmodified.

Add pass-through gate tests for binary streaming

948b0ee

Tests verify non-processable 2xx responses return PassThrough, non-processable errors stay Buffered, and processable content goes through Stream (not PassThrough).

Add byte-level pass-through test and update doc comment

ec60a9a

Adds pass_through_preserves_body_and_content_length test that verifies io::copy produces identical output and Content-Length is preserved. Updates handle_publisher_request doc to describe all three response variants.

Address PR #594 review feedback

fd33844

- Fix PassThrough doc comment operation order (set_body before finalize) - Update function doc to describe actual PassThrough flow (set_body + send_to_client, not io::copy) - Remove dead _enters_early_return variable, replace with comment

Merge branch 'main' into specs/streaming-response-optimization

1fe1bb4

Merge branch 'specs/streaming-response-optimization' into feature/str…

c32fe3e

…eaming-pipeline-phase2 # Conflicts: # crates/trusted-server-core/src/publisher.rs

Phase 2: Stream responses to client via StreamingBody #585

ca151e5

Phase 3: Make script rewriters fragment-safe for streaming #591

f9c08fa

Phase 4: Stream binary pass-through responses via send_to_client() #594

76d8b6f

aram356 changed the title ~~Phase 1: Spec + Make streaming pipeline chunk-emitting~~ Streaming pipeline chunk-emitting Apr 17, 2026

aram356 requested review from ChristianPavilonis and prk-Jr April 17, 2026 17:29

ChristianPavilonis reviewed Apr 17, 2026

View reviewed changes

ChristianPavilonis approved these changes Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming pipeline chunk-emitting#562

Streaming pipeline chunk-emitting#562
aram356 wants to merge 69 commits intomainfrom
specs/streaming-response-optimization

aram356 commented Mar 25, 2026 •

edited

Loading

Uh oh!

ChristianPavilonis Apr 17, 2026

Uh oh!

ChristianPavilonis Apr 17, 2026

Uh oh!

ChristianPavilonis Apr 17, 2026

Uh oh!

ChristianPavilonis left a comment

Uh oh!

ChristianPavilonis left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aram356 commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

ChristianPavilonis Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

ChristianPavilonis Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

ChristianPavilonis Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

ChristianPavilonis left a comment

Choose a reason for hiding this comment

Uh oh!

ChristianPavilonis left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aram356 commented Mar 25, 2026 •

edited

Loading