(improvement) Eliminate extra BytesIO allocation in encode_message compression path (80-300ns savings, 17-32%) by mykaul · Pull Request #800 · scylladb/python-driver

mykaul · 2026-04-06T18:45:24Z

Summary

Unify the two BytesIO allocations in encode_message into a single buff = io.BytesIO() declared once before the compression branch. In the compression path (protocol v4 and below), the body is written into buff, extracted via getvalue(), compressed, and returned via direct bytes concatenation (header + compressed_body). The non-compression path reuses the same buff with the existing seek-based header reservation pattern.

Before (master): 2 BytesIO() allocations on the compression path (buff + body), 1 on the non-compression path.
After: 1 BytesIO() allocation on both paths.

Motivation

For protocol v4 and below, compression happens at the message level (v5+ uses segment-level compression with checksumming). The original code created a separate body = io.BytesIO() for the uncompressed payload, then copied the result into a second buff = io.BytesIO() before writing the header. This second buffer is unnecessary — we can write the body into buff directly, extract it, compress, and concatenate the header as bytes.

Benchmark (CPython 3.14, per-call, two runs on quiet machine)

Body size	Original	Optimized	Savings per call
64B	403-416ns	305-317ns	86-111ns (21-27%)
1KB	556-561ns	422-437ns	119-138ns (21-25%)
16KB	865-1122ns	718-761ns	148-361ns (17-32%)

Applies to every compressed message on protocol v4 and below.

Changes

cassandra/protocol.py: Move buff = io.BytesIO() before the if branch. In the compression path, write body into buff instead of a separate body = io.BytesIO(), extract via getvalue(), compress, and return header + body via bytes concat. Non-compression path uses the same buff with seek(9) header reservation as before.

Testing

Unit tests pass (645 passed, 43 skipped). The non-compression path is structurally unchanged.

mykaul · 2026-04-06T19:36:48Z

Benchmark results (CPython 3.14, 500k iterations)

Body size	Original	Optimized	Δ per call
64B	352ns	180ns	-172ns
1KB	384ns	230ns	-154ns
16KB	658ns	446ns	-212ns

Applies to protocol v4 and below where compression is at the message level. Eliminates one BytesIO allocation per compressed message by using direct bytes concatenation for the header + compressed body.

…n path When compression is active (protocol v4 and below), encode_message previously created two BytesIO objects: one for the uncompressed body, then another to write the header + compressed body. The second BytesIO is unnecessary -- use direct bytes concatenation (header + compressed_body) instead, which avoids one BytesIO allocation per compressed message. The non-compression path already used a single BytesIO and is unchanged.

mykaul · 2026-04-10T21:06:47Z

Follow-up commit: inline has_checksumming_support check

Commit: 04e4ba5d7 — perf: inline has_checksumming_support check to avoid classmethod call overhead

Change

Replaced ProtocolVersion.has_checksumming_support(protocol_version) calls in encode_message and decode_message with inline integer comparisons using pre-computed module-level constants (_CHECKSUMMING_MIN_VERSION, _CHECKSUMMING_MAX_VERSION). This avoids the classmethod dispatch overhead on every encode/decode call.

Semantics are identical: V5 <= version < DSE_V1.

Benchmark results (`benchmarks/bench_checksumming_inline.py`)

Python 3.14.3
  classmethod call: 61.3 ns
  inline compare:   25.5 ns
  saving: 35.8 ns/call (2.4x)

This is called twice per message (once on encode, once on decode), saving ~72 ns per round-trip.

Tests

607 unit tests passed, 0 failures.

… overhead Replace ProtocolVersion.has_checksumming_support(protocol_version) calls in encode_message and decode_message with inline integer comparisons using pre-computed module-level constants. This avoids the classmethod dispatch overhead on every encode/decode call. Benchmark: classmethod call: 61.3 ns inline compare: 25.5 ns saving: 35.8 ns/call (2.4x)

mykaul marked this pull request as draft April 6, 2026 19:26

mykaul force-pushed the perf/encode-message-bytesio branch from f3f51d3 to 2b4e88b Compare April 6, 2026 20:25

mykaul changed the title ~~(improvement) Eliminate extra BytesIO allocation in encode_message compression path~~ (improvement) Eliminate extra BytesIO allocation in encode_message compression path (80-300ns savings, 17-32%) Apr 7, 2026

mykaul force-pushed the perf/encode-message-bytesio branch 2 times, most recently from e3572ae to 04e4ba5 Compare April 11, 2026 16:07

mykaul force-pushed the perf/encode-message-bytesio branch from 04e4ba5 to 859ddfe Compare April 11, 2026 16:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(improvement) Eliminate extra BytesIO allocation in encode_message compression path (80-300ns savings, 17-32%) #800

(improvement) Eliminate extra BytesIO allocation in encode_message compression path (80-300ns savings, 17-32%) #800
mykaul wants to merge 2 commits intoscylladb:masterfrom
mykaul:perf/encode-message-bytesio

mykaul commented Apr 6, 2026 •

edited

Loading

Uh oh!

mykaul commented Apr 6, 2026

Uh oh!

mykaul commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mykaul commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Benchmark (CPython 3.14, per-call, two runs on quiet machine)

Changes

Testing

Uh oh!

mykaul commented Apr 6, 2026

Benchmark results (CPython 3.14, 500k iterations)

Uh oh!

mykaul commented Apr 10, 2026

Follow-up commit: inline has_checksumming_support check

Change

Benchmark results (benchmarks/bench_checksumming_inline.py)

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mykaul commented Apr 6, 2026 •

edited

Loading

Benchmark results (`benchmarks/bench_checksumming_inline.py`)