perf: Optimize column_encryption_policy checks in recv_results_rows (1.7x speedup for Cython, 1.2x for Python)#630
perf: Optimize column_encryption_policy checks in recv_results_rows (1.7x speedup for Cython, 1.2x for Python)#630mykaul wants to merge 5 commits intoscylladb:masterfrom
Conversation
|
CI failure is unrelated, and I've seen it happening on too many PRs already :-/ |
3a9031e to
ad8b6f9
Compare
dkropachev
left a comment
There was a problem hiding this comment.
numpy_parser.pyx, parsing.pyx also needs to be fixed
76c09ae to
f44a9ba
Compare
It wasn't clear to me what needs to fixing there: |
Instead of changing only While Abstract classes: |
Let me see if I understand this:
Is that an accurate representation?
I'm not sure I understand the interaction between Python and Cython - I was not going to change the interfaces whatsoever.
You raise a good point about the names of the functions - that can clearly be more clear with 'plain' and 'encrypted' in them! I'll fix that. |
f44a9ba to
132859f
Compare
132859f to
d87f917
Compare
Done in latest commit. I think it's ready for re-review. |
|
CI failure seems like known issue #446 |
|
Can this be reviewed? In my (limited) testing, it did show some nice improvement. It's just a branch or so, but apparently, was impacting to some degree the performance. |
…nt.bind() When Cython serializers (from cassandra.serializers) are available and no column encryption policy is active, BoundStatement.bind() now uses pre-built Serializer objects cached on the PreparedStatement instead of calling cqltype classmethods. This avoids per-value Python method dispatch overhead and enables the ~30x vector serialization speedup from the Cython serializers module. The bind loop is split into three paths: 1. Column encryption policy path (unchanged behavior) 2. Cython serializers path (new fast path) 3. Plain Python path (no CE, no Cython -- removes per-value ColDesc/CE check) Depends on PR scylladb#748 (Cython serializers module) and PR scylladb#630 (CE-policy bind split).
…nt.bind() When Cython serializers (from cassandra.serializers) are available and no column encryption policy is active, BoundStatement.bind() now uses pre-built Serializer objects cached on the PreparedStatement instead of calling cqltype classmethods. This avoids per-value Python method dispatch overhead and enables the ~30x vector serialization speedup from the Cython serializers module. The bind loop is split into three paths: 1. Column encryption policy path (unchanged behavior) 2. Cython serializers path (new fast path) 3. Plain Python path (no CE, no Cython -- removes per-value ColDesc/CE check) Depends on PR scylladb#748 (Cython serializers module) and PR scylladb#630 (CE-policy bind split).
…nt.bind() When Cython serializers (from cassandra.serializers) are available and no column encryption policy is active, BoundStatement.bind() now uses pre-built Serializer objects cached on the PreparedStatement instead of calling cqltype classmethods. This avoids per-value Python method dispatch overhead and enables the ~30x vector serialization speedup from the Cython serializers module. The bind loop is split into three paths: 1. Column encryption policy path (unchanged behavior) 2. Cython serializers path (new fast path) 3. Plain Python path (no CE, no Cython -- removes per-value ColDesc/CE check) Depends on PR scylladb#748 (Cython serializers module) and PR scylladb#630 (CE-policy bind split).
Split recv_results_rows into fast path (no column encryption) and slow
path (column encryption enabled):
Fast path (common case):
- Reads raw column bytes and decodes types in a single pass per row
via _decode_row_inline(), eliminating the intermediate list-of-lists
- Skips ColDesc namedtuple creation entirely (only needed for CE)
- No closure allocation per call
Slow path (column encryption):
- Preserves full CE logic with ColDesc creation
- Moves decode_val/decode_row closures to module-level functions
(_decode_val_ce, _decode_row_ce) to avoid per-call closure overhead
Note: This PR modifies the same method as PR scylladb#630 (which also splits
recv_results_rows into CE/non-CE branches). There will be a merge
conflict that needs manual resolution if both PRs are accepted.
1e272ed to
2b02175
Compare
…nt.bind() When Cython serializers (from cassandra.serializers) are available and no column encryption policy is active, BoundStatement.bind() now uses pre-built Serializer objects cached on the PreparedStatement instead of calling cqltype classmethods. This avoids per-value Python method dispatch overhead and enables the ~30x vector serialization speedup from the Cython serializers module. The bind loop is split into three paths: 1. Column encryption policy path (unchanged behavior) 2. Cython serializers path (new fast path) 3. Plain Python path (no CE, no Cython -- removes per-value ColDesc/CE check) Depends on PR scylladb#748 (Cython serializers module) and PR scylladb#630 (CE-policy bind split).
…nt.bind() When Cython serializers (from cassandra.serializers) are available and no column encryption policy is active, BoundStatement.bind() now uses pre-built Serializer objects cached on the PreparedStatement instead of calling cqltype classmethods. This avoids per-value Python method dispatch overhead and enables the ~30x vector serialization speedup from the Cython serializers module. The bind loop is split into three paths: 1. Column encryption policy path (unchanged behavior) 2. Cython serializers path (new fast path) 3. Plain Python path (no CE, no Cython -- removes per-value ColDesc/CE check) Depends on PR scylladb#748 (Cython serializers module) and PR scylladb#630 (CE-policy bind split).
…nt.bind() When Cython serializers (from cassandra.serializers) are available and no column encryption policy is active, BoundStatement.bind() now uses pre-built Serializer objects cached on the PreparedStatement instead of calling cqltype classmethods. This avoids per-value Python method dispatch overhead and enables the ~30x vector serialization speedup from the Cython serializers module. The bind loop is split into three paths: 1. Column encryption policy path (unchanged behavior) 2. Cython serializers path (new fast path) 3. Plain Python path (no CE, no Cython -- removes per-value ColDesc/CE check) Depends on PR scylladb#748 (Cython serializers module) and PR scylladb#630 (CE-policy bind split).
Split recv_results_rows into fast path (no column encryption) and slow
path (column encryption enabled):
Fast path (common case):
- Reads raw column bytes and decodes types in a single pass per row
via _decode_row_inline(), eliminating the intermediate list-of-lists
- Skips ColDesc namedtuple creation entirely (only needed for CE)
- No closure allocation per call
Slow path (column encryption):
- Preserves full CE logic with ColDesc creation
- Moves decode_val/decode_row closures to module-level functions
(_decode_val_ce, _decode_row_ce) to avoid per-call closure overhead
Note: This PR modifies the same method as PR scylladb#630 (which also splits
recv_results_rows into CE/non-CE branches). There will be a merge
conflict that needs manual resolution if both PRs are accepted.
Split recv_results_rows into fast path (no column encryption) and slow
path (column encryption enabled):
Fast path (common case):
- Reads raw column bytes and decodes types in a single pass per row
via _decode_row_inline(), eliminating the intermediate list-of-lists
- Skips ColDesc namedtuple creation entirely (only needed for CE)
- No closure allocation per call
- Wraps decode errors with column name/type info for diagnostics
Slow path (column encryption):
- Preserves full CE logic with ColDesc creation
- Moves decode_val/decode_row closures to module-level functions
(_decode_val_ce, _decode_row_ce) to avoid per-call closure overhead
Note: This PR modifies the same method as PR scylladb#630 (which also splits
recv_results_rows into CE/non-CE branches). There will be a merge
conflict that needs manual resolution if both PRs are accepted.
Split recv_results_rows() into two code paths: one for when column_encryption_policy is set and one for the common case without it. This avoids the per-value 'ce_policy and ce_policy.contains_column(...)' check, which is redundant for the vast majority of queries.
Split TupleRowParser.unpack_row() into unpack_plain_row() and unpack_col_encrypted_row(), branching once in the callers (ListParser, LazyParser, row_parser error path) rather than per value. - Add boundscheck/wraparound Cython optimizations on both methods - Extract try/except outside the per-column loop - NumpyParser.parse_rows() raises NotImplementedError when column encryption is configured, since it has no decryption support - Update parsing.pxd declarations to match
Apply the same split-path optimization to the bind() method, checking for column_encryption_policy once rather than per parameter.
Cover both the pure-Python path (ResultMessage.recv_results_rows) and the Cython fast path (ListParser, NumpyParser) with and without column encryption policy.
…licy When column_encryption_policy is None (the common case), the ColDesc namedtuple list is never accessed by unpack_plain_row(). Skip building it entirely, passing None to ParseDesc instead. This avoids N namedtuple allocations per query (where N = number of columns) on the Cython fast path for non-encrypted workloads.
Rebased to origin/master + Pure-Python benchmarkRebased cleanly onto current Pure-Python recv_results_rows decode benchmarkThe original PR description covers Cython ListParser performance. Here are the pure-Python path numbers (10,000 rows, no Consistent 1.19–1.22x speedup on the pure-Python path by eliminating:
Environment: Python 3.14.3, no Cython |
Updated benchmarks — Cython and Pure-Python pathsAll benchmarks: 10,000 rows, no Cython ListParser (obj_parser.pyx + row_parser.pyx)Pure-Python (protocol.py recv_results_rows)Summary
Cython gains are larger for Int32-only workloads (up to 1.36x) because the per-column overhead eliminated by this PR is a bigger fraction of the total when the deserialization itself is cheap (Int32 = single |
Optimize column_encryption_policy checks in the result parsing hot path.
Previously, every decoded value checked
ce_policy and ce_policy.contains_column(coldesc)even when no encryption policy was configured (the common case). This PR splits the parsing into two code paths —unpack_plain_row()andunpack_col_encrypted_row()— and branches once per result set instead of per value.Changes:
recv_results_rows()pure-Python pathTupleRowParser.unpack_row()intounpack_plain_row()/unpack_col_encrypted_row(), with@boundscheck(False)/@wraparound(False)and try/except moved outside the per-column loopcolumn_encryption_policyNumpyParser.parse_rows()raisesNotImplementedErrorwhen column encryption is configured (it has no decryption support)BoundStatement.bind()Benchmark (Cython ListParser, 10k rows, no encryption policy)
Improvement comes from eliminating per-value overhead in the hot loop:
ce_policy and ce_policy.contains_column(coldesc)truthiness checkcoldesc = desc.coldescs[i]lookup (not needed in plain path)try/exceptper value moved outside the loop@cython.boundscheck(False)/@cython.wraparound(False)decoratorsFixes: #582
Pre-review checklist
./docs/source/.Fixes:annotations to PR description.