Skip to content

Optimize frame reading throughput#24

Merged
samuel-williams-shopify merged 4 commits intosocketry:mainfrom
wtn:perf
Apr 16, 2026
Merged

Optimize frame reading throughput#24
samuel-williams-shopify merged 4 commits intosocketry:mainfrom
wtn:perf

Conversation

@wtn
Copy link
Copy Markdown
Contributor

@wtn wtn commented Apr 15, 2026

Reduce per-frame overhead on the read path.

On a local benchmark (25K text frames, ~83B avg payload, Unix socket pair, IO::Stream::Buffered) throughput improved ~26%.

List of Changes

  • Read both header bytes in a single stream.read(2) instead of two separate reads
  • Combine extended length + mask into one read for masked frames (stream.read(6) or stream.read(12))
  • Use getbyte/unpack1 instead of unpack("C").first
  • Short-circuit unpack_frames for single-frame messages (to avoid map/join)

Types of Changes

  • Performance improvement.

Contribution

@samuel-williams-shopify
Copy link
Copy Markdown
Contributor

We independently verified the changes in this PR by re-applying each commit
one at a time from main, running a reproducible benchmark after every step,
and confirming the improvement before proceeding to the next.

Environment

  • Ruby 4.0.2 (arm64-darwin)
  • Benchmark: in-process StringIO-backed framer, median of 7 repetitions × 40,000 frames per scenario (200 frames for large payloads)
  • Primary metric: median µs/frame, averaged across four common scenarios

Scenarios

Scenario Payload Length encoding Masked
small unmasked 13 B 1 byte no
small masked 13 B 1 byte yes
medium unmasked 200 B 2 bytes (126) no
medium masked 200 B 2 bytes (126) yes

Large frames (70 kB, length=127 encoding) were also exercised but excluded from
the primary metric as they are dominated by payload I/O rather than parsing overhead.

Results

Step Change µs/frame vs main
Baseline (main) 0.644
1 parse_header: getbyte(0) instead of unpack("C").first; integer comparisons instead of Range#include? 0.561 −12.8%
2 Framer#read_frame: read both header bytes in a single stream.read(2) call; pass second byte directly to Frame.read, eliminating a redundant read 0.449 −30.3%
3 Frame.read: unpack1 instead of .unpack(...).first; combined length+mask read for masked extended frames (one 6-byte or 12-byte read instead of two separate reads) 0.433 −32.7%
4 Connection#unpack_frames: fast-path for the common single-frame case, avoiding map + join 0.431 −33.1%

Per-scenario breakdown (baseline vs final)

Scenario main This PR Δ
small unmasked 0.572 µs 0.368 µs −35.6%
small masked 0.635 µs 0.429 µs −32.4%
medium unmasked 0.654 µs 0.436 µs −33.4%
medium masked 0.714 µs 0.482 µs −32.5%

Notes

  • Step 2 is the dominant win (~two-thirds of the total improvement). Eliminating
    one stream.read call per frame — by reading the 2-byte header atomically and
    passing the second byte directly into Frame.read — is the single biggest change.
  • All improvements were consistent across payload sizes and mask/no-mask variants,
    indicating the gains are in parsing overhead rather than payload handling.
  • The test suite was extended with 5 new cases (masked medium/large messages,
    pre-read second_byte, reserved opcode rejection) and all 92 tests pass.

Comment thread lib/protocol/websocket/framer.rb
Comment thread lib/protocol/websocket/framer.rb Outdated
@samuel-williams-shopify samuel-williams-shopify merged commit ae76bd5 into socketry:main Apr 16, 2026
18 of 21 checks passed
@ioquatix ioquatix added this to the v0.21.0 milestone Apr 16, 2026
@wtn wtn deleted the perf branch April 16, 2026 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants