Skip to content

(improvement) Skip tuple(partial(...)) construction for empty callbacks/errbacks (190ns saving per call - 63%)#803

Draft
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/skip-empty-callbacks
Draft

(improvement) Skip tuple(partial(...)) construction for empty callbacks/errbacks (190ns saving per call - 63%)#803
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/skip-empty-callbacks

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Apr 6, 2026

Summary

In _set_final_result and _set_final_exception, skip building the tuple(partial(fn, ...)) when the callbacks/errbacks list is empty. Most queries use the synchronous result() path and never register callbacks or errbacks.

Motivation

Every query completion (_set_final_result / _set_final_exception) previously constructed a tuple(partial(fn, response, *args, **kwargs) for ...) from the callbacks/errbacks list, even when that list is empty. This creates an empty generator and an empty tuple on every query. The synchronous result() API (the overwhelmingly common usage pattern) never registers callbacks, so this work is wasted.

Benchmark (CPython 3.14, per-call)

Callback state Original Optimized Savings per call
Empty (synchronous result()) 304ns 114ns 190ns (63%)
1 callback (async path) 562ns 560ns 2ns (no regression)

The synchronous result() API is the common path — this saves 190ns per query completion.

Changes

  • cassandra/cluster.py:
    • _set_final_result: Check if callbacks: before building to_call; set to_call = None otherwise
    • _set_final_exception: Same pattern for errbacks
    • Guard the application loop with if to_call: check

Testing

Unit tests pass (58/58 in test_response_future.py and test_cluster.py). The lock semantics are preserved — the check and tuple construction still happen inside _callback_lock.

In _set_final_result and _set_final_exception, skip building the
tuple of partial(fn, ...) when the callbacks/errbacks list is empty.

Most queries use the synchronous result() path and never register
callbacks or errbacks, so this avoids constructing an empty tuple
and the associated generator overhead on every query completion.
@mykaul mykaul marked this pull request as draft April 6, 2026 19:26
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 6, 2026

Benchmark results (CPython 3.14, 500k iterations)

Callback state Original Optimized Δ per call
Empty (synchronous result()) 304ns 114ns -190ns
1 callback (async path) 562ns 560ns -2ns (no regression)

The synchronous result() API is the overwhelmingly common usage pattern and never registers callbacks. This skips the tuple(partial(fn, ...) for ...) generator + empty tuple construction on every query completion.

@mykaul mykaul changed the title (improvement) Skip tuple(partial(...)) construction for empty callbacks/errbacks (improvement) Skip tuple(partial(...)) construction for empty callbacks/errbacks (190ns saving per call - 63%) Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant