Skip to content

(improvement) Eliminate extra BytesIO allocation in encode_message compression path (80-300ns savings, 17-32%) #800

Draft
mykaul wants to merge 2 commits intoscylladb:masterfrom
mykaul:perf/encode-message-bytesio
Draft

(improvement) Eliminate extra BytesIO allocation in encode_message compression path (80-300ns savings, 17-32%) #800
mykaul wants to merge 2 commits intoscylladb:masterfrom
mykaul:perf/encode-message-bytesio

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Apr 6, 2026

Summary

Unify the two BytesIO allocations in encode_message into a single buff = io.BytesIO() declared once before the compression branch. In the compression path (protocol v4 and below), the body is written into buff, extracted via getvalue(), compressed, and returned via direct bytes concatenation (header + compressed_body). The non-compression path reuses the same buff with the existing seek-based header reservation pattern.

Before (master): 2 BytesIO() allocations on the compression path (buff + body), 1 on the non-compression path.
After: 1 BytesIO() allocation on both paths.

Motivation

For protocol v4 and below, compression happens at the message level (v5+ uses segment-level compression with checksumming). The original code created a separate body = io.BytesIO() for the uncompressed payload, then copied the result into a second buff = io.BytesIO() before writing the header. This second buffer is unnecessary — we can write the body into buff directly, extract it, compress, and concatenate the header as bytes.

Benchmark (CPython 3.14, per-call, two runs on quiet machine)

Body size Original Optimized Savings per call
64B 403-416ns 305-317ns 86-111ns (21-27%)
1KB 556-561ns 422-437ns 119-138ns (21-25%)
16KB 865-1122ns 718-761ns 148-361ns (17-32%)

Applies to every compressed message on protocol v4 and below.

Changes

  • cassandra/protocol.py: Move buff = io.BytesIO() before the if branch. In the compression path, write body into buff instead of a separate body = io.BytesIO(), extract via getvalue(), compress, and return header + body via bytes concat. Non-compression path uses the same buff with seek(9) header reservation as before.

Testing

Unit tests pass (645 passed, 43 skipped). The non-compression path is structurally unchanged.

@mykaul mykaul marked this pull request as draft April 6, 2026 19:26
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 6, 2026

Benchmark results (CPython 3.14, 500k iterations)

Body size Original Optimized Δ per call
64B 352ns 180ns -172ns
1KB 384ns 230ns -154ns
16KB 658ns 446ns -212ns

Applies to protocol v4 and below where compression is at the message level. Eliminates one BytesIO allocation per compressed message by using direct bytes concatenation for the header + compressed body.

…n path

When compression is active (protocol v4 and below), encode_message
previously created two BytesIO objects: one for the uncompressed body,
then another to write the header + compressed body. The second BytesIO
is unnecessary -- use direct bytes concatenation (header + compressed_body)
instead, which avoids one BytesIO allocation per compressed message.

The non-compression path already used a single BytesIO and is unchanged.
@mykaul mykaul force-pushed the perf/encode-message-bytesio branch from f3f51d3 to 2b4e88b Compare April 6, 2026 20:25
@mykaul mykaul changed the title (improvement) Eliminate extra BytesIO allocation in encode_message compression path (improvement) Eliminate extra BytesIO allocation in encode_message compression path (80-300ns savings, 17-32%) Apr 7, 2026
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 10, 2026

Follow-up commit: inline has_checksumming_support check

Commit: 04e4ba5d7perf: inline has_checksumming_support check to avoid classmethod call overhead

Change

Replaced ProtocolVersion.has_checksumming_support(protocol_version) calls in encode_message and decode_message with inline integer comparisons using pre-computed module-level constants (_CHECKSUMMING_MIN_VERSION, _CHECKSUMMING_MAX_VERSION). This avoids the classmethod dispatch overhead on every encode/decode call.

Semantics are identical: V5 <= version < DSE_V1.

Benchmark results (benchmarks/bench_checksumming_inline.py)

Python 3.14.3
  classmethod call: 61.3 ns
  inline compare:   25.5 ns
  saving: 35.8 ns/call (2.4x)

This is called twice per message (once on encode, once on decode), saving ~72 ns per round-trip.

Tests

607 unit tests passed, 0 failures.

@mykaul mykaul force-pushed the perf/encode-message-bytesio branch 2 times, most recently from e3572ae to 04e4ba5 Compare April 11, 2026 16:07
… overhead

Replace ProtocolVersion.has_checksumming_support(protocol_version) calls
in encode_message and decode_message with inline integer comparisons
using pre-computed module-level constants. This avoids the classmethod
dispatch overhead on every encode/decode call.

Benchmark:
  classmethod call: 61.3 ns
  inline compare:   25.5 ns
  saving: 35.8 ns/call (2.4x)
@mykaul mykaul force-pushed the perf/encode-message-bytesio branch from 04e4ba5 to 859ddfe Compare April 11, 2026 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant