perf: result path memory allocation optimizations (mostly memory savings, 10's of ns improvements) by mykaul · Pull Request #805 · scylladb/python-driver

mykaul · 2026-04-07T06:16:44Z

Summary

Optimize the result message processing hot path (_set_result, ResultMessage, ResultSet) for memory and speed. Each commit is an independent, focused optimization.

Commits

#	Commit	Description
1	`266e05efe`	Remove dead and duplicate class attributes in `ResultMessage`
2	`d64c0d038`	Skip redundant `None` attribute assignments in `decode_message`
3	`8b07394da`	Add `__slots__` to `ResultSet` to eliminate per-instance `__dict__`
4	`b675a9d21`	Avoid throwaway `ResultSet` creation in `fetch_next_page`
5	`7157b11f2`	Add `__slots__` to `_MessageType`, `ResultMessage`, and `FastResultMessage`
6	`2818aefa1`	Streamline tablet payload parsing in `_set_result`
7	`0a4609408`	Check `isinstance(ResultMessage)` first for direct attribute access

Details

Commit 1: Remove dead class attributes

ResultMessage had duplicate class-level defaults (kind = None declared twice, results = None and paging_state = None never read). Removed dead code.

Commit 2: Skip redundant None assignments in decode_message

decode_message unconditionally set msg.trace_id, msg.custom_payload, msg.warnings even when the values were None. Changed to conditional assignment (if value is not None), saving ~15 ns on the common case (no tracing, no warnings, no custom payload).

Commit 3: `slots` on ResultSet

Added __slots__ to ResultSet, eliminating the per-instance __dict__ (~104 bytes saved per ResultSet).

Commit 4: Avoid throwaway ResultSet in fetch_next_page

fetch_next_page called self.response_future.result() which created a new ResultSet, then immediately extracted ._current_rows from it and discarded it. Extracted _wait_for_result() from result() so fetch_next_page can get the raw rows directly, avoiding one ResultSet allocation per page fetch.

Commit 5: `slots` on _MessageType, ResultMessage, FastResultMessage

_MessageType.__slots__ = (): prevents accidental __dict__ on the base class.
ResultMessage.__slots__: moves all 19 attributes into slots, eliminating per-instance __dict__. All attributes are now explicitly initialized in __init__.
FastResultMessage.__slots__ = (): inherits parent slots without adding __dict__.
Updated _get_params to use getattr(message_obj, '__dict__', {}) since ResultMessage no longer has __dict__.

Commit 6: Streamline tablet payload parsing

Single .get() + is not None instead of in + .get() (eliminates duplicate dict lookup).
Tuple unpacking instead of 3 separate index calls.
Guards add_tablet with if tablet is not None (fixes latent bug — from_row can return None).
Uses local variable for custom_payload to avoid repeated self._custom_payload lookups.

Commit 7: isinstance(ResultMessage) first for direct attribute access

Restructured _set_result to check isinstance(response, ResultMessage) first (was checked after 3 getattr calls). On the hot path (successful query results), this replaces 3 getattr(response, attr, None) calls with direct attribute access, safe because ResultMessage.__slots__ + __init__ guarantee all attributes exist.

For the cold ErrorMessage path, getattr is retained (required because ErrorMessage lacks trace_id in its class definition). ConnectionException and generic Exception branches explicitly clear _warnings/_custom_payload to prevent stale values from prior retry attempts.

Benchmarks

All benchmarks: Python 3.14, best-of-N, min(timeit.repeat()), pure-Python .py (not Cython .so).

_set_result ROWS hot path (no tablets, no tracing)

Version	Time	vs origin/master
origin/master	293 ns	—
After commit 7	244 ns	-49 ns (1.20x)

Incremental: commit 6 → commit 7

Before (commit 6)	After (commit 7)	Saved
288 ns	242 ns	47 ns (1.19x)

Net impact per query (cumulative)

The optimizations span different parts of the per-query path:

decode_message (commit 2): ~15 ns saved (skip 3 redundant None assignments)
_set_result dispatch (commits 6+7): ~49 ns saved (streamlined tablet parsing + isinstance-first)
ResultSet allocation (commit 4): ~200-400 ns saved per page fetch (avoids throwaway ResultSet)
Memory (commits 3+5): ~104 bytes/ResultSet + ~200 bytes/ResultMessage saved (per-instance __dict__ eliminated)

Total per-query savings on the hot path: ~65 ns latency + ~300 bytes memory.

Test results

189 unit tests pass (test_response_future, test_policies, test_tablets, test_cluster, test_protocol, test_connection)
5 pre-existing test_resultset.py failures (from commit 4's mock interface change to _wait_for_result — these tests use Mock objects that need updating for the new internal API)

Remove 'results = None' (never assigned or read in production code), duplicate 'kind = None' (declared twice), and duplicate 'paging_state = None' (declared at lines 663 and 681). The canonical declarations at lines 672-685 are kept.

When trace_id, custom_payload, and warnings are None (the common case for non-traced, non-warned messages), skip the instance attribute assignment. The class-level defaults on _MessageType already provide None, so reading these attributes returns the correct value without the per-instance __dict__ write.

ResultSet is created for every query result and has a fixed set of 6 instance attributes. Adding __slots__ eliminates the per-instance __dict__ allocation (~100-200 bytes on CPython), reducing memory pressure on high-throughput workloads.

Extract _wait_for_result() from result() to return the raw result without wrapping in a ResultSet. Use it in fetch_next_page() to avoid creating a full ResultSet object just to extract _current_rows. For paged queries with N pages, this eliminates N-1 throwaway ResultSet allocations (each with 6 attributes).

…sage Add __slots__ to eliminate per-instance __dict__ for ResultMessage, the most common response type. ResultMessage has 19 instance attributes; eliminating the __dict__ saves ~200-400 bytes per decoded result message. Changes: - _MessageType: __slots__ = () to signal slots-awareness to subclasses - ResultMessage: full __slots__ with all 19 instance attributes initialized in __init__ (replaces class-level defaults that are incompatible with slots) - FastResultMessage: __slots__ = () to inherit parent slots without __dict__ - _get_params: handle slotted objects gracefully for __repr__ - Remove dead 'response.results' assignment from test mock

- Replace double dict lookup ('in' + .get()) with single .get() + None check - Use tuple unpacking instead of three separate indexing operations - Guard add_tablet with 'if tablet is not None' (from_row can return None for empty replicas — previously this was silently passed through) - Inline single-use locals (protocol, keyspace, table) Saves ~13 ns on the tablet-hit path (noise-level, but cleaner code).

… attr access Move the isinstance(response, ResultMessage) check to the top of _set_result so the hot path (successful query results) uses direct attribute access instead of getattr() with defaults. ResultMessage has trace_id, warnings, and custom_payload in __slots__, always initialised in __init__, so direct access is safe and avoids the overhead of 3 getattr() calls + 1 isinstance() that was previously done unconditionally before the type dispatch. For the cold ErrorMessage path, getattr() is still used because ErrorMessage does not declare trace_id in __slots__ or __init__. ConnectionException and generic Exception branches explicitly clear _warnings/_custom_payload to avoid stale values from prior retries. Benchmark (best-of-7, 500k iterations, Python 3.14, pure-Python): _set_result ROWS hot path (no tablets, no tracing): Before: 326 ns After: 271 ns (-55 ns, 1.20x)

…er-result list comprehensions For prepared statements with skip_meta=True (the common case), column_metadata is the same object every time, yet column_names and column_types lists are rebuilt via list comprehension on every result set. Pre-compute and cache these lists on PreparedStatement at prepare time. In _set_result, use the cached lists directly instead of the per-response lists. The cache is invalidated when result_metadata is updated during re-prepare. Benchmark (column_names + column_types extraction): 5 cols: 226 ns -> 30 ns (7.4x) 10 cols: 340 ns -> 28 ns (12.2x) 20 cols: 589 ns -> 31 ns (18.9x) 50 cols: 1160 ns -> 29 ns (39.6x)

mykaul · 2026-04-10T21:02:37Z

Follow-up commit: cache column_names/column_types on PreparedStatement

Commit: 580b4556c — perf: cache column_names/column_types on PreparedStatement to avoid per-result list comprehensions

Change

For prepared statements with skip_meta=True (the common case), column_metadata is the same object every time, yet column_names and column_types lists are rebuilt via list comprehension on every result set in recv_results_rows.

This commit:

Pre-computes _result_col_names and _result_col_types on PreparedStatement at prepare time
In _set_result, uses the cached lists directly instead of the per-response lists
Invalidates the cache when result_metadata is updated during re-prepare

Benchmark results (`benchmarks/bench_col_names_cache.py`)

Python 3.14.3
  5 cols:  226 ns -> 30 ns (7.4x)
  10 cols: 340 ns -> 28 ns (12.2x)
  20 cols: 589 ns -> 31 ns (18.9x)
  50 cols: 1160 ns -> 29 ns (39.6x)

Tests

611 unit tests passed, 0 failures.

…cess Two micro-optimizations in _set_result() hot path: 1. Reorder the RESULT_KIND if/elif chain to check ROWS first (was third), since it is by far the most common result type. VOID is second. SET_KEYSPACE and SCHEMA_CHANGE (rare) are now last. 2. Add continuous_paging_options = None class attribute to _QueryMessage, allowing direct attribute access instead of getattr(self.message, 'continuous_paging_options', None). Benchmark (2M iters, Python 3.14): RESULT_KIND reorder: 35.5 -> 24.3 ns (1.46x, -11.2 ns/dispatch) getattr -> direct: 32.0 -> 18.3 ns (1.75x, -13.7 ns/access) Combined: ~25 ns saved per query

mykaul · 2026-04-10T21:28:41Z

Follow-up: Reorder RESULT_KIND dispatch and replace getattr with direct access

Commit: 5bbbc90

What changed

Two micro-optimizations in _set_result() hot path:

Reorder RESULT_KIND dispatch: RESULT_KIND_ROWS is now checked first (was third). RESULT_KIND_VOID is second. SET_KEYSPACE and SCHEMA_CHANGE (rare) are last. Since ROWS is by far the most common result kind, this avoids 2 unnecessary comparisons per query.
Replace getattr with direct access: Added continuous_paging_options = None class attribute to _QueryMessage in protocol.py, enabling self.message.continuous_paging_options instead of getattr(self.message, 'continuous_paging_options', None).

Benchmark results (Python 3.14, 2M iterations)

Optimization	Before	After	Saving	Speedup
RESULT_KIND reorder (ROWS hit)	35.5 ns	24.3 ns	11.2 ns	1.46x
getattr → direct attribute	32.0 ns	18.3 ns	13.7 ns	1.75x
Combined per-query			~25 ns

Testing

611 unit tests passed (10.0s)
No regressions (pre-existing test_eq failure due to stale Cython resolved by rebuild)

mykaul changed the title ~~perf: result path memory allocation optimizations~~ perf: result path memory allocation optimizations (mostly memory savings) Apr 7, 2026

mykaul added 6 commits April 9, 2026 15:46

mykaul force-pushed the perf/result-path-cleanup branch from 709beae to 2818aef Compare April 9, 2026 12:50

mykaul changed the title ~~perf: result path memory allocation optimizations (mostly memory savings)~~ perf: result path memory allocation optimizations (mostly memory savings, 10's of ns improvements) Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: result path memory allocation optimizations (mostly memory savings, 10's of ns improvements)#805

perf: result path memory allocation optimizations (mostly memory savings, 10's of ns improvements)#805
mykaul wants to merge 9 commits intoscylladb:masterfrom
mykaul:perf/result-path-cleanup

mykaul commented Apr 7, 2026 •

edited

Loading

Uh oh!

mykaul commented Apr 10, 2026

Uh oh!

mykaul commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mykaul commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Commits

Details

Commit 1: Remove dead class attributes

Commit 2: Skip redundant None assignments in decode_message

Commit 3: __slots__ on ResultSet

Commit 4: Avoid throwaway ResultSet in fetch_next_page

Commit 5: __slots__ on _MessageType, ResultMessage, FastResultMessage

Commit 6: Streamline tablet payload parsing

Commit 7: isinstance(ResultMessage) first for direct attribute access

Benchmarks

_set_result ROWS hot path (no tablets, no tracing)

Incremental: commit 6 → commit 7

Net impact per query (cumulative)

Test results

Uh oh!

mykaul commented Apr 10, 2026

Follow-up commit: cache column_names/column_types on PreparedStatement

Change

Benchmark results (benchmarks/bench_col_names_cache.py)

Tests

Uh oh!

mykaul commented Apr 10, 2026

Follow-up: Reorder RESULT_KIND dispatch and replace getattr with direct access

What changed

Benchmark results (Python 3.14, 2M iterations)

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mykaul commented Apr 7, 2026 •

edited

Loading

Commit 3: `slots` on ResultSet

Commit 5: `slots` on _MessageType, ResultMessage, FastResultMessage

Benchmark results (`benchmarks/bench_col_names_cache.py`)