Skip to content

Comments

Feat/optimize html streaming#351

Draft
prk-Jr wants to merge 11 commits intomainfrom
feat/optimize-html-streaming
Draft

Feat/optimize html streaming#351
prk-Jr wants to merge 11 commits intomainfrom
feat/optimize-html-streaming

Conversation

@prk-Jr
Copy link
Collaborator

@prk-Jr prk-Jr commented Feb 20, 2026

Summary

This PR fundamentally shifts the Trusted Server’s publisher proxy path from a fully buffered model to a chunked streaming architecture, delivering Phase 1 and 2.1 of the performance optimization plan.

By replacing response buffering with Fastly’s stream_to_client() API and optimizing the lol_html output pipeline for true incremental streaming, response headers and initial HTML chunks are now dispatched to the client as soon as they are processed. This significantly reduces Time-To-First-Byte (TTFB) and unblocks early client-side subresource discovery.

Additionally, WASM hostcalls have been batched to improve throughput and reduce memory pressure.


Key Changes

  • stream_to_client() Integration (publisher.rs)
    Replaced fully buffered response collection with stream_to_client() to enable immediate header dispatch and incremental chunk streaming.

  • lol_html Output Pipeline (streaming_processor.rs)
    Refactored the HtmlRewriter adapter to implement the OutputSink trait using a shared Rc<RefCell<Vec<u8>>> buffer, enabling true incremental streaming.

  • Buffer Pre-allocation
    Replaced std::mem::take with Vec::with_capacity and std::mem::replace to eliminate reallocation churn during chunk processing.

  • WASM Hostcall Batching
    Wrapped the StreamingBody output in an 8KB std::io::BufWriter to reduce expensive WASM-to-host boundary crossings.

  • Code Health

    • Resolved associated Clippy warnings
    • Added # Errors documentation sections to the streaming handler

Test Plan

  • Local Unit & Workspace Tests
    Run:

    cargo test --workspace

    to ensure all functionality remains intact.

  • TypeScript Bundle Build
    Run:

    npm run build

    in crates/js/lib to verify successful generation of integration modules.

  • Local Fastly Simulation
    Run:

    fastly compute serve

    Verify:

    • Headers are correctly injected on streamed responses
    • Proxy behavior remains correct
    • Baseline TTFB improvements (e.g., via curl)
  • Staging Load Testing
    Execute:

    ./scripts/benchmark.sh

    against staging to quantify external TTFB and Time-to-Last-Byte (TTLB) improvements under concurrent traffic.

prk-Jr and others added 11 commits February 18, 2026 21:33
Introduce RequestTimer for per-request phase tracking (init, backend,
process, total) exposed via Server-Timing response headers. Add
benchmark tooling with --profile mode for collecting timing data.
Document phased optimization plan covering streaming architecture,
code-level fixes, and open design questions for team review.
Introduce RequestTimer for per-request phase tracking (init, backend,
process, total) exposed via Server-Timing response headers. Add
benchmark tooling with --profile mode for collecting timing data.
Document phased optimization plan covering streaming architecture,
code-level fixes, and open design questions for team review.
RequestTimer and Server-Timing header were premature — WASM guest
profiling via profile.sh gives better per-function visibility without
runtime overhead. Also strips dead --profile mode from benchmark.sh.
build.rs already resolves trusted-server.toml + env vars at compile time
and embeds the result. Replace Settings::from_toml() with direct
toml::from_str() to skip the config crate pipeline on every request.
Profiling confirms: ~5-8% → ~3.3% CPU per request.
- OPTIMIZATION.md: profiling results, CPU breakdown, phased optimization
  plan covering streaming fixes, config crate elimination, and
  stream_to_client() architecture
- scripts/profile.sh: WASM guest profiling via --profile-guest with
  Firefox Profiler-compatible output
- scripts/benchmark.sh: TTFB analysis, cold start detection, endpoint
  latency breakdown, and load testing with save/compare support
…ding HTML and RSC Flight URL rewriting, to avoid full-body buffering
@prk-Jr prk-Jr self-assigned this Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant