perf: result path memory allocation optimizations (mostly memory savings, 10's of ns improvements)#805
perf: result path memory allocation optimizations (mostly memory savings, 10's of ns improvements)#805mykaul wants to merge 9 commits intoscylladb:masterfrom
Conversation
Remove 'results = None' (never assigned or read in production code), duplicate 'kind = None' (declared twice), and duplicate 'paging_state = None' (declared at lines 663 and 681). The canonical declarations at lines 672-685 are kept.
When trace_id, custom_payload, and warnings are None (the common case for non-traced, non-warned messages), skip the instance attribute assignment. The class-level defaults on _MessageType already provide None, so reading these attributes returns the correct value without the per-instance __dict__ write.
ResultSet is created for every query result and has a fixed set of 6 instance attributes. Adding __slots__ eliminates the per-instance __dict__ allocation (~100-200 bytes on CPython), reducing memory pressure on high-throughput workloads.
Extract _wait_for_result() from result() to return the raw result without wrapping in a ResultSet. Use it in fetch_next_page() to avoid creating a full ResultSet object just to extract _current_rows. For paged queries with N pages, this eliminates N-1 throwaway ResultSet allocations (each with 6 attributes).
…sage Add __slots__ to eliminate per-instance __dict__ for ResultMessage, the most common response type. ResultMessage has 19 instance attributes; eliminating the __dict__ saves ~200-400 bytes per decoded result message. Changes: - _MessageType: __slots__ = () to signal slots-awareness to subclasses - ResultMessage: full __slots__ with all 19 instance attributes initialized in __init__ (replaces class-level defaults that are incompatible with slots) - FastResultMessage: __slots__ = () to inherit parent slots without __dict__ - _get_params: handle slotted objects gracefully for __repr__ - Remove dead 'response.results' assignment from test mock
- Replace double dict lookup ('in' + .get()) with single .get() + None check
- Use tuple unpacking instead of three separate indexing operations
- Guard add_tablet with 'if tablet is not None' (from_row can return None
for empty replicas — previously this was silently passed through)
- Inline single-use locals (protocol, keyspace, table)
Saves ~13 ns on the tablet-hit path (noise-level, but cleaner code).
709beae to
2818aef
Compare
… attr access
Move the isinstance(response, ResultMessage) check to the top of
_set_result so the hot path (successful query results) uses direct
attribute access instead of getattr() with defaults.
ResultMessage has trace_id, warnings, and custom_payload in __slots__,
always initialised in __init__, so direct access is safe and avoids the
overhead of 3 getattr() calls + 1 isinstance() that was previously done
unconditionally before the type dispatch.
For the cold ErrorMessage path, getattr() is still used because
ErrorMessage does not declare trace_id in __slots__ or __init__.
ConnectionException and generic Exception branches explicitly clear
_warnings/_custom_payload to avoid stale values from prior retries.
Benchmark (best-of-7, 500k iterations, Python 3.14, pure-Python):
_set_result ROWS hot path (no tablets, no tracing):
Before: 326 ns
After: 271 ns (-55 ns, 1.20x)
…er-result list comprehensions For prepared statements with skip_meta=True (the common case), column_metadata is the same object every time, yet column_names and column_types lists are rebuilt via list comprehension on every result set. Pre-compute and cache these lists on PreparedStatement at prepare time. In _set_result, use the cached lists directly instead of the per-response lists. The cache is invalidated when result_metadata is updated during re-prepare. Benchmark (column_names + column_types extraction): 5 cols: 226 ns -> 30 ns (7.4x) 10 cols: 340 ns -> 28 ns (12.2x) 20 cols: 589 ns -> 31 ns (18.9x) 50 cols: 1160 ns -> 29 ns (39.6x)
Follow-up commit: cache column_names/column_types on PreparedStatementCommit: ChangeFor prepared statements with This commit:
Benchmark results (
|
…cess Two micro-optimizations in _set_result() hot path: 1. Reorder the RESULT_KIND if/elif chain to check ROWS first (was third), since it is by far the most common result type. VOID is second. SET_KEYSPACE and SCHEMA_CHANGE (rare) are now last. 2. Add continuous_paging_options = None class attribute to _QueryMessage, allowing direct attribute access instead of getattr(self.message, 'continuous_paging_options', None). Benchmark (2M iters, Python 3.14): RESULT_KIND reorder: 35.5 -> 24.3 ns (1.46x, -11.2 ns/dispatch) getattr -> direct: 32.0 -> 18.3 ns (1.75x, -13.7 ns/access) Combined: ~25 ns saved per query
Follow-up: Reorder RESULT_KIND dispatch and replace getattr with direct accessCommit: 5bbbc90 What changedTwo micro-optimizations in
Benchmark results (Python 3.14, 2M iterations)
Testing
|
Summary
Optimize the result message processing hot path (
_set_result,ResultMessage,ResultSet) for memory and speed. Each commit is an independent, focused optimization.Commits
266e05efeResultMessaged64c0d038Noneattribute assignments indecode_message8b07394da__slots__toResultSetto eliminate per-instance__dict__b675a9d21ResultSetcreation infetch_next_page7157b11f2__slots__to_MessageType,ResultMessage, andFastResultMessage2818aefa1_set_result0a4609408isinstance(ResultMessage)first for direct attribute accessDetails
Commit 1: Remove dead class attributes
ResultMessagehad duplicate class-level defaults (kind = Nonedeclared twice,results = Noneandpaging_state = Nonenever read). Removed dead code.Commit 2: Skip redundant None assignments in decode_message
decode_messageunconditionally setmsg.trace_id,msg.custom_payload,msg.warningseven when the values wereNone. Changed to conditional assignment (if value is not None), saving ~15 ns on the common case (no tracing, no warnings, no custom payload).Commit 3:
__slots__on ResultSetAdded
__slots__toResultSet, eliminating the per-instance__dict__(~104 bytes saved perResultSet).Commit 4: Avoid throwaway ResultSet in fetch_next_page
fetch_next_pagecalledself.response_future.result()which created a newResultSet, then immediately extracted._current_rowsfrom it and discarded it. Extracted_wait_for_result()fromresult()sofetch_next_pagecan get the raw rows directly, avoiding oneResultSetallocation per page fetch.Commit 5:
__slots__on _MessageType, ResultMessage, FastResultMessage_MessageType.__slots__ = (): prevents accidental__dict__on the base class.ResultMessage.__slots__: moves all 19 attributes into slots, eliminating per-instance__dict__. All attributes are now explicitly initialized in__init__.FastResultMessage.__slots__ = (): inherits parent slots without adding__dict__._get_paramsto usegetattr(message_obj, '__dict__', {})sinceResultMessageno longer has__dict__.Commit 6: Streamline tablet payload parsing
.get()+is not Noneinstead ofin+.get()(eliminates duplicate dict lookup).add_tabletwithif tablet is not None(fixes latent bug —from_rowcan returnNone).custom_payloadto avoid repeatedself._custom_payloadlookups.Commit 7: isinstance(ResultMessage) first for direct attribute access
Restructured
_set_resultto checkisinstance(response, ResultMessage)first (was checked after 3getattrcalls). On the hot path (successful query results), this replaces 3getattr(response, attr, None)calls with direct attribute access, safe becauseResultMessage.__slots__+__init__guarantee all attributes exist.For the cold
ErrorMessagepath,getattris retained (required becauseErrorMessagelackstrace_idin its class definition).ConnectionExceptionand genericExceptionbranches explicitly clear_warnings/_custom_payloadto prevent stale values from prior retry attempts.Benchmarks
All benchmarks: Python 3.14, best-of-N,
min(timeit.repeat()), pure-Python.py(not Cython.so)._set_result ROWS hot path (no tablets, no tracing)
Incremental: commit 6 → commit 7
Net impact per query (cumulative)
The optimizations span different parts of the per-query path:
__dict__eliminated)Total per-query savings on the hot path: ~65 ns latency + ~300 bytes memory.
Test results
test_resultset.pyfailures (from commit 4's mock interface change to_wait_for_result— these tests useMockobjects that need updating for the new internal API)