perf: Optimize column_encryption_policy checks in recv_results_rows (1.7x speedup for Cython, 1.2x for Python) by mykaul · Pull Request #630 · scylladb/python-driver

mykaul · 2025-12-25T17:21:24Z

Optimize column_encryption_policy checks in the result parsing hot path.

Previously, every decoded value checked ce_policy and ce_policy.contains_column(coldesc) even when no encryption policy was configured (the common case). This PR splits the parsing into two code paths — unpack_plain_row() and unpack_col_encrypted_row() — and branches once per result set instead of per value.

Changes:

protocol.py: Split recv_results_rows() pure-Python path
obj_parser.pyx: Split TupleRowParser.unpack_row() into unpack_plain_row() / unpack_col_encrypted_row(), with @boundscheck(False) / @wraparound(False) and try/except moved outside the per-column loop
ListParser / LazyParser / row_parser: Branch once on column_encryption_policy
numpy_parser.pyx: NumpyParser.parse_rows() raises NotImplementedError when column encryption is configured (it has no decryption support)
query.py: Same split-path optimization in BoundStatement.bind()

Benchmark (Cython ListParser, 10k rows, no encryption policy)

                      Master            PR            Speedup
2 cols (Int32+UTF8):  3.67 ms           2.08 ms       1.77x  (2.7M -> 4.8M rows/s)
10 cols (all Int32):  8.37 ms           5.18 ms       1.62x  (1.2M -> 1.9M rows/s)

Improvement comes from eliminating per-value overhead in the hot loop:

ce_policy and ce_policy.contains_column(coldesc) truthiness check
coldesc = desc.coldescs[i] lookup (not needed in plain path)
try/except per value moved outside the loop
@cython.boundscheck(False) / @cython.wraparound(False) decorators

Fixes: #582

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
I have adjusted the documentation in ./docs/source/.
I added appropriate Fixes: annotations to PR description.

mykaul · 2025-12-25T19:30:16Z

CI failure is unrelated, and I've seen it happening on too many PRs already :-/

tests/unit/test_protocol_decode_optimization.py

dkropachev

numpy_parser.pyx, parsing.pyx also needs to be fixed

tests/unit/test_protocol_decode_optimization.py

mykaul · 2026-01-06T08:32:40Z

numpy_parser.pyx, parsing.pyx also needs to be fixed

It wasn't clear to me what needs to fixing there:
numpy - it doesn't use/call column_encryption
parsing.pyx - there's just 'not implemented' functions there.

dkropachev · 2026-01-06T15:10:16Z

numpy_parser.pyx, parsing.pyx also needs to be fixed

It wasn't clear to me what needs to fixing there: numpy - it doesn't use/call column_encryption parsing.pyx - there's just 'not implemented' functions there.

Instead of changing only TupleRowParser class you need to do similar changes to ColumnParser and RowParser and all their implementations: ListParser, LazyParser and NumpyParser, you will discover that NumpyParser disregards column_encryption_policy which means that it does not work properly when when columns are encrypted, have a plug there that throwing NotImplementedError.
In total all Parser implementation classes should contain implementations for:

    cpdef parse_plain_rows(self, BytesIOReader reader, ParseDesc desc)

    cpdef parse_enc_rows(self, BytesIOReader reader, ParseDesc desc)

While Abstract classes:

cdef class ColumnParser:
    cpdef parse_rows(self, BytesIOReader reader, ParseDesc desc):
        if desc.column_encryption_policy:
            return self.unpack_enc_row(reader, desc)
        return self.unpack_plain_row(reader, desc)

    cpdef parse_plain_rows(self, BytesIOReader reader, ParseDesc desc)

    cpdef parse_enc_rows(self, BytesIOReader reader, ParseDesc desc)

mykaul · 2026-01-06T15:47:17Z

numpy_parser.pyx, parsing.pyx also needs to be fixed

It wasn't clear to me what needs to fixing there: numpy - it doesn't use/call column_encryption parsing.pyx - there's just 'not implemented' functions there.

Instead of changing only TupleRowParser class you need to do similar changes to ColumnParser and RowParser and all their implementations: ListParser, LazyParser and NumpyParser, you will discover that NumpyParser disregards column_encryption_policy which means that it does not work properly when when columns are encrypted, have a plug there that throwing NotImplementedError. In total all Parser implementation classes should contain implementations for:

Let me see if I understand this:

TupleRowParser - Implemented unpack_ce_row() and unpack_row()
ColumnParser - its parse_rows() creates a TupleRowParser object - and then I call its unpack_ce_row() / unpack_row()
RowParser- its unpack_row() is raising NotImplementedError - I'm not sure I should change this in this PR?
ListParser - its parse_rows() creates a TupleRowParser object - and then I call its unpack_ce_row() / unpack_row()
LazyParser - its parse_rows() calls parse_rows_lazy() - which creates a TupleRowParser object - and then I call its unpack_ce_row() / unpack_row()
NumpyParser its parse_rows() calls its _parse_rows() which completely ignores encryption and calls unpack_row() - but I have no idea where it's implemented. BUT - I'm not introducing a regression here, no?
BoundStatement.bind() - added in the last commit in the series.

Is that an accurate representation?

    cpdef parse_plain_rows(self, BytesIOReader reader, ParseDesc desc)

    cpdef parse_enc_rows(self, BytesIOReader reader, ParseDesc desc)

I'm not sure I understand the interaction between Python and Cython - I was not going to change the interfaces whatsoever.

While Abstract classes:

cdef class ColumnParser:
    cpdef parse_rows(self, BytesIOReader reader, ParseDesc desc):
        if desc.column_encryption_policy:
            return self.unpack_enc_row(reader, desc)
        return self.unpack_plain_row(reader, desc)

    cpdef parse_plain_rows(self, BytesIOReader reader, ParseDesc desc)

    cpdef parse_enc_rows(self, BytesIOReader reader, ParseDesc desc)

You raise a good point about the names of the functions - that can clearly be more clear with 'plain' and 'encrypted' in them! I'll fix that.

mykaul · 2026-01-27T16:41:16Z

You raise a good point about the names of the functions - that can clearly be more clear with 'plain' and 'encrypted' in them! I'll fix that.

Done in latest commit. I think it's ready for re-review.

mykaul · 2026-01-27T17:03:53Z

CI failure seems like known issue #446

mykaul · 2026-03-02T16:16:07Z

Can this be reviewed? In my (limited) testing, it did show some nice improvement. It's just a branch or so, but apparently, was impacting to some degree the performance.

…nt.bind() When Cython serializers (from cassandra.serializers) are available and no column encryption policy is active, BoundStatement.bind() now uses pre-built Serializer objects cached on the PreparedStatement instead of calling cqltype classmethods. This avoids per-value Python method dispatch overhead and enables the ~30x vector serialization speedup from the Cython serializers module. The bind loop is split into three paths: 1. Column encryption policy path (unchanged behavior) 2. Cython serializers path (new fast path) 3. Plain Python path (no CE, no Cython -- removes per-value ColDesc/CE check) Depends on PR scylladb#748 (Cython serializers module) and PR scylladb#630 (CE-policy bind split).

Split recv_results_rows into fast path (no column encryption) and slow path (column encryption enabled): Fast path (common case): - Reads raw column bytes and decodes types in a single pass per row via _decode_row_inline(), eliminating the intermediate list-of-lists - Skips ColDesc namedtuple creation entirely (only needed for CE) - No closure allocation per call Slow path (column encryption): - Preserves full CE logic with ColDesc creation - Moves decode_val/decode_row closures to module-level functions (_decode_val_ce, _decode_row_ce) to avoid per-call closure overhead Note: This PR modifies the same method as PR scylladb#630 (which also splits recv_results_rows into CE/non-CE branches). There will be a merge conflict that needs manual resolution if both PRs are accepted.

…nt.bind() When Cython serializers (from cassandra.serializers) are available and no column encryption policy is active, BoundStatement.bind() now uses pre-built Serializer objects cached on the PreparedStatement instead of calling cqltype classmethods. This avoids per-value Python method dispatch overhead and enables the ~30x vector serialization speedup from the Cython serializers module. The bind loop is split into three paths: 1. Column encryption policy path (unchanged behavior) 2. Cython serializers path (new fast path) 3. Plain Python path (no CE, no Cython -- removes per-value ColDesc/CE check) Depends on PR scylladb#748 (Cython serializers module) and PR scylladb#630 (CE-policy bind split).

Split recv_results_rows into fast path (no column encryption) and slow path (column encryption enabled): Fast path (common case): - Reads raw column bytes and decodes types in a single pass per row via _decode_row_inline(), eliminating the intermediate list-of-lists - Skips ColDesc namedtuple creation entirely (only needed for CE) - No closure allocation per call Slow path (column encryption): - Preserves full CE logic with ColDesc creation - Moves decode_val/decode_row closures to module-level functions (_decode_val_ce, _decode_row_ce) to avoid per-call closure overhead Note: This PR modifies the same method as PR scylladb#630 (which also splits recv_results_rows into CE/non-CE branches). There will be a merge conflict that needs manual resolution if both PRs are accepted.

Split recv_results_rows into fast path (no column encryption) and slow path (column encryption enabled): Fast path (common case): - Reads raw column bytes and decodes types in a single pass per row via _decode_row_inline(), eliminating the intermediate list-of-lists - Skips ColDesc namedtuple creation entirely (only needed for CE) - No closure allocation per call - Wraps decode errors with column name/type info for diagnostics Slow path (column encryption): - Preserves full CE logic with ColDesc creation - Moves decode_val/decode_row closures to module-level functions (_decode_val_ce, _decode_row_ce) to avoid per-call closure overhead Note: This PR modifies the same method as PR scylladb#630 (which also splits recv_results_rows into CE/non-CE branches). There will be a merge conflict that needs manual resolution if both PRs are accepted.

Split recv_results_rows() into two code paths: one for when column_encryption_policy is set and one for the common case without it. This avoids the per-value 'ce_policy and ce_policy.contains_column(...)' check, which is redundant for the vast majority of queries.

Split TupleRowParser.unpack_row() into unpack_plain_row() and unpack_col_encrypted_row(), branching once in the callers (ListParser, LazyParser, row_parser error path) rather than per value. - Add boundscheck/wraparound Cython optimizations on both methods - Extract try/except outside the per-column loop - NumpyParser.parse_rows() raises NotImplementedError when column encryption is configured, since it has no decryption support - Update parsing.pxd declarations to match

Apply the same split-path optimization to the bind() method, checking for column_encryption_policy once rather than per parameter.

Cover both the pure-Python path (ResultMessage.recv_results_rows) and the Cython fast path (ListParser, NumpyParser) with and without column encryption policy.

…licy When column_encryption_policy is None (the common case), the ColDesc namedtuple list is never accessed by unpack_plain_row(). Skip building it entirely, passing None to ParseDesc instead. This avoids N namedtuple allocations per query (where N = number of columns) on the Cython fast path for non-encrypted workloads.

mykaul · 2026-04-09T18:31:15Z

Rebased to origin/master + Pure-Python benchmark

Rebased cleanly onto current origin/master (no conflicts). All 16 protocol unit tests + 112 other unit tests pass.

Pure-Python recv_results_rows decode benchmark

The original PR description covers Cython ListParser performance. Here are the pure-Python path numbers (10,000 rows, no column_encryption_policy, best of 5 runs, CPU pinned to single core):

Config                        Master (ms)    PR (ms)  Speedup   Master rows/s       PR rows/s
----------------------------------------------------------------------------------------------------
2 cols (Int32+UTF8)               15.61 ms    12.78 ms    1.22x       640,768/s       782,702/s
10 cols (Int32+UTF8)              41.32 ms    34.21 ms    1.21x       241,999/s       292,289/s
2 cols (all Int32)                13.54 ms    11.26 ms    1.20x       738,461/s       888,109/s
10 cols (all Int32)               41.01 ms    34.46 ms    1.19x       243,837/s       290,209/s

Consistent 1.19–1.22x speedup on the pure-Python path by eliminating:

ColDesc namedtuple allocation per column (N allocations per query)
Per-value column_encryption_policy and ce_policy.contains_column(coldesc) truthiness check
Per-value col_desc lookup from the col_descs list
3-way zip(row, column_metadata, col_descs) → 2-way zip(row, column_metadata)

Environment: Python 3.14.3, no Cython .so for protocol.py.

mykaul · 2026-04-09T19:00:04Z

Updated benchmarks — Cython and Pure-Python paths

All benchmarks: 10,000 rows, no column_encryption_policy (common case), best of 5 runs, CPU pinned to single core (core 15), 1000 iters (Cython) / 200 iters (pure-Python), with warm-up. Python 3.14.3, Cython 3.2.4.

Cython ListParser (obj_parser.pyx + row_parser.pyx)

Config                       Master ns/col-row    PR ns/col-row  Speedup
-------------------------------------------------------------------------
2 cols (Int32+UTF8)                 91.4 ns           81.2 ns     1.13x
10 cols (Int32+UTF8)                88.0 ns           74.1 ns     1.19x
2 cols (all Int32)                  58.6 ns           45.1 ns     1.30x
10 cols (all Int32)                 38.5 ns           28.4 ns     1.36x

Pure-Python (protocol.py recv_results_rows)

Config                       Master ns/col-row    PR ns/col-row  Speedup
-------------------------------------------------------------------------
2 cols (Int32+UTF8)                788.7 ns          649.4 ns     1.21x
10 cols (Int32+UTF8)               432.5 ns          355.1 ns     1.22x
2 cols (all Int32)                 680.8 ns          564.5 ns     1.21x
10 cols (all Int32)                449.0 ns          358.2 ns     1.25x

Summary

Path	Speedup range
Cython ListParser	1.13–1.36x
Pure-Python decode	1.21–1.25x

Cython gains are larger for Int32-only workloads (up to 1.36x) because the per-column overhead eliminated by this PR is a bigger fraction of the total when the deserialization itself is cheap (Int32 = single struct.unpack). Mixed Int32+UTF8 workloads show a smaller but still significant improvement.

dkropachev mentioned this pull request Dec 29, 2025

Optimize column_encryption_policy checks in make_recv_results_rows #639

Open

dkropachev requested changes Dec 29, 2025

View reviewed changes

tests/unit/test_protocol_decode_optimization.py Outdated Show resolved Hide resolved

tests/unit/test_protocol_decode_optimization.py Outdated Show resolved Hide resolved

mykaul force-pushed the remove_enc_cols branch 2 times, most recently from 3a9031e to ad8b6f9 Compare January 5, 2026 08:01

dkropachev requested changes Jan 5, 2026

View reviewed changes

tests/unit/test_protocol_decode_optimization.py Outdated Show resolved Hide resolved

mykaul force-pushed the remove_enc_cols branch from 76c09ae to f44a9ba Compare January 6, 2026 08:13

mykaul force-pushed the remove_enc_cols branch from f44a9ba to 132859f Compare January 18, 2026 12:50

mykaul force-pushed the remove_enc_cols branch from 132859f to d87f917 Compare January 27, 2026 16:29

mykaul mentioned this pull request Feb 1, 2026

[RFC] (improvement)Remove CQL binary protocol v3 #683

Closed

8 tasks

mykaul added the enhancement New feature or request label Feb 8, 2026

mykaul mentioned this pull request Mar 9, 2026

(improvement) perf: remove copies on the read path (us improvements - x1.5-3.7 speedup!) #734

Draft

8 tasks

mykaul force-pushed the remove_enc_cols branch from da4c06d to fe5e968 Compare March 12, 2026 15:36

This was referenced Mar 14, 2026

(improvement) query: add Cython-aware serializer path in BoundStatement.bind() (1-26x speedup - tens/hundreds of us reduced) #749

Draft

(improvement) LWT prepared statement performance: analysis and improvement plan #751

Open

mykaul self-assigned this Mar 16, 2026

mykaul mentioned this pull request Mar 25, 2026

perf: inline row decoding and eliminate closures in recv_results_rows (100's to 1000's of ns, x1.3-1.8 speedup, Python only) #765

Draft

mykaul force-pushed the remove_enc_cols branch 2 times, most recently from 1e272ed to 2b02175 Compare April 2, 2026 08:56

mykaul changed the title ~~Optimize column_encryption_policy checks in recv_results_rows~~ perf: Optimize column_encryption_policy checks in recv_results_rows (1.7x speedup - ms improvements) Apr 7, 2026

mykaul added 5 commits April 9, 2026 21:11

Optimize column_encryption_policy checks in BoundStatement.bind()

dacb79c

Apply the same split-path optimization to the bind() method, checking for column_encryption_policy once rather than per parameter.

Add unit tests for column_encryption_policy optimization

20e7ed6

Cover both the pure-Python path (ResultMessage.recv_results_rows) and the Cython fast path (ListParser, NumpyParser) with and without column encryption policy.

mykaul force-pushed the remove_enc_cols branch from a4498d2 to 3ec0c94 Compare April 9, 2026 18:31

mykaul changed the title ~~perf: Optimize column_encryption_policy checks in recv_results_rows (1.7x speedup - ms improvements)~~ perf: Optimize column_encryption_policy checks in recv_results_rows (1.7x speedup for Cython, 1.2x for Python) Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Optimize column_encryption_policy checks in recv_results_rows (1.7x speedup for Cython, 1.2x for Python)#630

perf: Optimize column_encryption_policy checks in recv_results_rows (1.7x speedup for Cython, 1.2x for Python)#630
mykaul wants to merge 5 commits intoscylladb:masterfrom
mykaul:remove_enc_cols

mykaul commented Dec 25, 2025 •

edited

Loading

Uh oh!

mykaul commented Dec 25, 2025

Uh oh!

Uh oh!

Uh oh!

dkropachev left a comment

Uh oh!

Uh oh!

mykaul commented Jan 6, 2026

Uh oh!

dkropachev commented Jan 6, 2026

Uh oh!

mykaul commented Jan 6, 2026 •

edited

Loading

Uh oh!

mykaul commented Jan 27, 2026

Uh oh!

mykaul commented Jan 27, 2026

Uh oh!

mykaul commented Mar 2, 2026

Uh oh!

mykaul commented Apr 9, 2026

Uh oh!

mykaul commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mykaul commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark (Cython ListParser, 10k rows, no encryption policy)

Pre-review checklist

Uh oh!

mykaul commented Dec 25, 2025

Uh oh!

Uh oh!

Uh oh!

dkropachev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mykaul commented Jan 6, 2026

Uh oh!

dkropachev commented Jan 6, 2026

Uh oh!

mykaul commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mykaul commented Jan 27, 2026

Uh oh!

mykaul commented Jan 27, 2026

Uh oh!

mykaul commented Mar 2, 2026

Uh oh!

mykaul commented Apr 9, 2026

Rebased to origin/master + Pure-Python benchmark

Pure-Python recv_results_rows decode benchmark

Uh oh!

mykaul commented Apr 9, 2026

Updated benchmarks — Cython and Pure-Python paths

Cython ListParser (obj_parser.pyx + row_parser.pyx)

Pure-Python (protocol.py recv_results_rows)

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mykaul commented Dec 25, 2025 •

edited

Loading

mykaul commented Jan 6, 2026 •

edited

Loading