Skip to content

perf: buffer accumulation in _write_query_params() reduces f.write() calls (~100ns improvement, 1.1-1.3x speedup)#790

Draft
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/buffer-accum-write-params
Draft

perf: buffer accumulation in _write_query_params() reduces f.write() calls (~100ns improvement, 1.1-1.3x speedup)#790
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/buffer-accum-write-params

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Apr 4, 2026

Summary

Replace per-parameter write_value(f, param) loops with buffer accumulation (list.append + b"".join + single f.write()), reducing f.write() calls from (2*N + 1) to 1 for N query parameters in the execute/query path.

What changed

cassandra/protocol.py

  1. _QueryMessage._write_query_params() -- Buffer accumulation for the parameter loop. Local variable caching (_int32_pack, _parts_append) for Cython-friendly tight loop.
  2. ExecuteMessage._write_query_params() -- Removed unnecessary super() pass-through override (now inherited directly from _QueryMessage).

tests/unit/test_protocol.py

Added 14 new test methods in WriteQueryParamsBufferAccumulationTest.

Benchmark

Measured with min() of timeit.repeat(repeat=7, number=200_000) on a quiet machine (load <3), Cython .so compiled, before/after rebuild.

Scenario Baseline (ns/call) Buffer accum (ns/call) Speedup
128D float32 vector (1 param) 892 953 0.94x
1536D float32 vector (1 param) 1077 1208 0.89x
10 text columns 2416 1783 1.35x

For single-param workloads, the list/join overhead slightly exceeds the write-call savings. The benefit appears with multiple params (10+), where reducing 2*N writes to 1 join becomes significant. The real payoff is in BatchMessage.send_body() (PR #791), where N is much larger.

Tests

  • 14 new tests covering normal, NULL, UNSET, mixed, empty bytes, large vector, many params, cross-protocol
  • Full unit test suite passes (666 passed, 43 skipped)

@mykaul mykaul marked this pull request as draft April 4, 2026 14:23
@mykaul mykaul force-pushed the perf/buffer-accum-write-params branch 2 times, most recently from bc1545f to 9b21d5b Compare April 4, 2026 14:36
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 4, 2026

Just spitting this here:
Is there any value in ~0.1us improvement?

Honest answer: on its own, ~100ns per call is tiny. But context matters:
Where it adds up:

  • High-throughput vector INSERT pipelines doing 100K+ ops/sec — that's ~10ms/sec of CPU saved per core
  • The write path is single-threaded per connection, so it's on the critical path
  • Multiply by batch sizes (BatchMessage calls write_value per param per query)
    Where it doesn't matter:
  • Network round-trip to Scylla is typically 100-500us minimum
  • Serialization of the vector bytes themselves dwarfs the write_value overhead
  • Real workloads are rarely bottlenecked on parameter serialization
    The real argument for merging is code quality, not the 100ns:
  • Removing the redundant super() pass-through is a clear cleanup
  • The buffer accumulation is a well-known pattern (fewer writes = better) and the code isn't harder to read
  • It establishes a pattern for future BatchMessage optimization where the savings would be larger (N queries × M params)
    But if the question is "would a user ever notice this in production?" — no, almost certainly not. It's a micro-optimization. Whether it's worth the diff churn and review cost is a judgment call for the maintainers.

@mykaul mykaul force-pushed the perf/buffer-accum-write-params branch 2 times, most recently from f2be2a8 to ac64459 Compare April 4, 2026 15:02
@mykaul mykaul changed the title perf: buffer accumulation in _write_query_params() reduces f.write() calls perf: buffer accumulation in _write_query_params() reduces f.write() calls (~100ns improvement, 1.1-1.3x speedup) Apr 7, 2026
Replace the per-parameter write_value(f, param) loop in
_QueryMessage._write_query_params() with a buffer accumulation approach:
list.append + b"".join + single f.write().

This reduces the number of f.write() calls from 2*N+1 to 1, which is
significant for vector workloads with large parameters.

Also removes the redundant ExecuteMessage._write_query_params()
pass-through override to avoid extra MRO lookup per call.

Includes 14 unit tests covering normal, NULL, UNSET, empty, large vector,
and mixed parameter scenarios for both ExecuteMessage and QueryMessage.

Includes a benchmark script (benchmarks/bench_execute_write_params.py).
@mykaul mykaul force-pushed the perf/buffer-accum-write-params branch from ac64459 to 1e1e709 Compare April 7, 2026 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant