Draft
Conversation
Introduce RequestTimer for per-request phase tracking (init, backend, process, total) exposed via Server-Timing response headers. Add benchmark tooling with --profile mode for collecting timing data. Document phased optimization plan covering streaming architecture, code-level fixes, and open design questions for team review.
Introduce RequestTimer for per-request phase tracking (init, backend, process, total) exposed via Server-Timing response headers. Add benchmark tooling with --profile mode for collecting timing data. Document phased optimization plan covering streaming architecture, code-level fixes, and open design questions for team review.
RequestTimer and Server-Timing header were premature — WASM guest profiling via profile.sh gives better per-function visibility without runtime overhead. Also strips dead --profile mode from benchmark.sh.
build.rs already resolves trusted-server.toml + env vars at compile time and embeds the result. Replace Settings::from_toml() with direct toml::from_str() to skip the config crate pipeline on every request. Profiling confirms: ~5-8% → ~3.3% CPU per request.
- OPTIMIZATION.md: profiling results, CPU breakdown, phased optimization plan covering streaming fixes, config crate elimination, and stream_to_client() architecture - scripts/profile.sh: WASM guest profiling via --profile-guest with Firefox Profiler-compatible output - scripts/benchmark.sh: TTFB analysis, cold start detection, endpoint latency breakdown, and load testing with save/compare support
…ding HTML and RSC Flight URL rewriting, to avoid full-body buffering
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fundamentally shifts the Trusted Server’s publisher proxy path from a fully buffered model to a chunked streaming architecture, delivering Phase 1 and 2.1 of the performance optimization plan.
By replacing response buffering with Fastly’s
stream_to_client()API and optimizing thelol_htmloutput pipeline for true incremental streaming, response headers and initial HTML chunks are now dispatched to the client as soon as they are processed. This significantly reduces Time-To-First-Byte (TTFB) and unblocks early client-side subresource discovery.Additionally, WASM hostcalls have been batched to improve throughput and reduce memory pressure.
Key Changes
stream_to_client()Integration (publisher.rs)Replaced fully buffered response collection with
stream_to_client()to enable immediate header dispatch and incremental chunk streaming.lol_htmlOutput Pipeline (streaming_processor.rs)Refactored the
HtmlRewriteradapter to implement theOutputSinktrait using a sharedRc<RefCell<Vec<u8>>>buffer, enabling true incremental streaming.Buffer Pre-allocation
Replaced
std::mem::takewithVec::with_capacityandstd::mem::replaceto eliminate reallocation churn during chunk processing.WASM Hostcall Batching
Wrapped the
StreamingBodyoutput in an 8KBstd::io::BufWriterto reduce expensive WASM-to-host boundary crossings.Code Health
# Errorsdocumentation sections to the streaming handlerTest Plan
Local Unit & Workspace Tests
Run:
cargo test --workspaceto ensure all functionality remains intact.
TypeScript Bundle Build
Run:
in
crates/js/libto verify successful generation of integration modules.Local Fastly Simulation
Run:
Verify:
curl)Staging Load Testing
Execute:
against staging to quantify external TTFB and Time-to-Last-Byte (TTLB) improvements under concurrent traffic.