perf: buffer accumulation in _write_query_params() reduces f.write() calls (~100ns improvement, 1.1-1.3x speedup) by mykaul · Pull Request #790 · scylladb/python-driver

mykaul · 2026-04-04T14:22:08Z

Summary

Replace per-parameter write_value(f, param) loops with buffer accumulation (list.append + b"".join + single f.write()), reducing f.write() calls from (2*N + 1) to 1 for N query parameters in the execute/query path.

What changed

`cassandra/protocol.py`

_QueryMessage._write_query_params() -- Buffer accumulation for the parameter loop. Local variable caching (_int32_pack, _parts_append) for Cython-friendly tight loop.
ExecuteMessage._write_query_params() -- Removed unnecessary super() pass-through override (now inherited directly from _QueryMessage).

`tests/unit/test_protocol.py`

Added 14 new test methods in WriteQueryParamsBufferAccumulationTest.

Benchmark

Measured with min() of timeit.repeat(repeat=7, number=200_000) on a quiet machine (load <3), Cython .so compiled, before/after rebuild.

Scenario	Baseline (ns/call)	Buffer accum (ns/call)	Speedup
128D float32 vector (1 param)	892	953	0.94x
1536D float32 vector (1 param)	1077	1208	0.89x
10 text columns	2416	1783	1.35x

For single-param workloads, the list/join overhead slightly exceeds the write-call savings. The benefit appears with multiple params (10+), where reducing 2*N writes to 1 join becomes significant. The real payoff is in BatchMessage.send_body() (PR #791), where N is much larger.

Tests

14 new tests covering normal, NULL, UNSET, mixed, empty bytes, large vector, many params, cross-protocol
Full unit test suite passes (666 passed, 43 skipped)

mykaul · 2026-04-04T14:40:00Z

Just spitting this here:
Is there any value in ~0.1us improvement?

Honest answer: on its own, ~100ns per call is tiny. But context matters:
Where it adds up:

High-throughput vector INSERT pipelines doing 100K+ ops/sec — that's ~10ms/sec of CPU saved per core
The write path is single-threaded per connection, so it's on the critical path
Multiply by batch sizes (BatchMessage calls write_value per param per query)
Where it doesn't matter:
Network round-trip to Scylla is typically 100-500us minimum
Serialization of the vector bytes themselves dwarfs the write_value overhead
Real workloads are rarely bottlenecked on parameter serialization
The real argument for merging is code quality, not the 100ns:
Removing the redundant super() pass-through is a clear cleanup
The buffer accumulation is a well-known pattern (fewer writes = better) and the code isn't harder to read
It establishes a pattern for future BatchMessage optimization where the savings would be larger (N queries × M params)
But if the question is "would a user ever notice this in production?" — no, almost certainly not. It's a micro-optimization. Whether it's worth the diff churn and review cost is a judgment call for the maintainers.

Replace the per-parameter write_value(f, param) loop in _QueryMessage._write_query_params() with a buffer accumulation approach: list.append + b"".join + single f.write(). This reduces the number of f.write() calls from 2*N+1 to 1, which is significant for vector workloads with large parameters. Also removes the redundant ExecuteMessage._write_query_params() pass-through override to avoid extra MRO lookup per call. Includes 14 unit tests covering normal, NULL, UNSET, empty, large vector, and mixed parameter scenarios for both ExecuteMessage and QueryMessage. Includes a benchmark script (benchmarks/bench_execute_write_params.py).

mykaul marked this pull request as draft April 4, 2026 14:23

mykaul force-pushed the perf/buffer-accum-write-params branch 2 times, most recently from bc1545f to 9b21d5b Compare April 4, 2026 14:36

mykaul force-pushed the perf/buffer-accum-write-params branch 2 times, most recently from f2be2a8 to ac64459 Compare April 4, 2026 15:02

mykaul mentioned this pull request Apr 4, 2026

perf: buffer accumulation in BatchMessage.send_body() (2x speedup, us improvement, depends on PR #790) #791

Draft

mykaul changed the title ~~perf: buffer accumulation in _write_query_params() reduces f.write() calls~~ perf: buffer accumulation in _write_query_params() reduces f.write() calls (~100ns improvement, 1.1-1.3x speedup) Apr 7, 2026

mykaul force-pushed the perf/buffer-accum-write-params branch from ac64459 to 1e1e709 Compare April 7, 2026 08:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: buffer accumulation in _write_query_params() reduces f.write() calls (~100ns improvement, 1.1-1.3x speedup)#790

perf: buffer accumulation in _write_query_params() reduces f.write() calls (~100ns improvement, 1.1-1.3x speedup)#790
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/buffer-accum-write-params

mykaul commented Apr 4, 2026 •

edited

Loading

Uh oh!

mykaul commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mykaul commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

cassandra/protocol.py

tests/unit/test_protocol.py

Benchmark

Tests

Uh oh!

mykaul commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mykaul commented Apr 4, 2026 •

edited

Loading

`cassandra/protocol.py`

`tests/unit/test_protocol.py`