Skip to content

perf: Optimize column_encryption_policy checks in recv_results_rows (1.7x speedup for Cython, 1.2x for Python)#630

Open
mykaul wants to merge 5 commits intoscylladb:masterfrom
mykaul:remove_enc_cols
Open

perf: Optimize column_encryption_policy checks in recv_results_rows (1.7x speedup for Cython, 1.2x for Python)#630
mykaul wants to merge 5 commits intoscylladb:masterfrom
mykaul:remove_enc_cols

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Dec 25, 2025

Optimize column_encryption_policy checks in the result parsing hot path.

Previously, every decoded value checked ce_policy and ce_policy.contains_column(coldesc) even when no encryption policy was configured (the common case). This PR splits the parsing into two code paths — unpack_plain_row() and unpack_col_encrypted_row() — and branches once per result set instead of per value.

Changes:

  • protocol.py: Split recv_results_rows() pure-Python path
  • obj_parser.pyx: Split TupleRowParser.unpack_row() into unpack_plain_row() / unpack_col_encrypted_row(), with @boundscheck(False) / @wraparound(False) and try/except moved outside the per-column loop
  • ListParser / LazyParser / row_parser: Branch once on column_encryption_policy
  • numpy_parser.pyx: NumpyParser.parse_rows() raises NotImplementedError when column encryption is configured (it has no decryption support)
  • query.py: Same split-path optimization in BoundStatement.bind()

Benchmark (Cython ListParser, 10k rows, no encryption policy)

                      Master            PR            Speedup
2 cols (Int32+UTF8):  3.67 ms           2.08 ms       1.77x  (2.7M -> 4.8M rows/s)
10 cols (all Int32):  8.37 ms           5.18 ms       1.62x  (1.2M -> 1.9M rows/s)

Improvement comes from eliminating per-value overhead in the hot loop:

  • ce_policy and ce_policy.contains_column(coldesc) truthiness check
  • coldesc = desc.coldescs[i] lookup (not needed in plain path)
  • try/except per value moved outside the loop
  • @cython.boundscheck(False) / @cython.wraparound(False) decorators

Fixes: #582

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@mykaul
Copy link
Copy Markdown
Author

mykaul commented Dec 25, 2025

CI failure is unrelated, and I've seen it happening on too many PRs already :-/

@mykaul mykaul force-pushed the remove_enc_cols branch 2 times, most recently from 3a9031e to ad8b6f9 Compare January 5, 2026 08:01
Copy link
Copy Markdown
Collaborator

@dkropachev dkropachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy_parser.pyx, parsing.pyx also needs to be fixed

@mykaul
Copy link
Copy Markdown
Author

mykaul commented Jan 6, 2026

numpy_parser.pyx, parsing.pyx also needs to be fixed

It wasn't clear to me what needs to fixing there:
numpy - it doesn't use/call column_encryption
parsing.pyx - there's just 'not implemented' functions there.

@dkropachev
Copy link
Copy Markdown
Collaborator

numpy_parser.pyx, parsing.pyx also needs to be fixed

It wasn't clear to me what needs to fixing there: numpy - it doesn't use/call column_encryption parsing.pyx - there's just 'not implemented' functions there.

Instead of changing only TupleRowParser class you need to do similar changes to ColumnParser and RowParser and all their implementations: ListParser, LazyParser and NumpyParser, you will discover that NumpyParser disregards column_encryption_policy which means that it does not work properly when when columns are encrypted, have a plug there that throwing NotImplementedError.
In total all Parser implementation classes should contain implementations for:

    cpdef parse_plain_rows(self, BytesIOReader reader, ParseDesc desc)

    cpdef parse_enc_rows(self, BytesIOReader reader, ParseDesc desc)

While Abstract classes:

cdef class ColumnParser:
    cpdef parse_rows(self, BytesIOReader reader, ParseDesc desc):
        if desc.column_encryption_policy:
            return self.unpack_enc_row(reader, desc)
        return self.unpack_plain_row(reader, desc)

    cpdef parse_plain_rows(self, BytesIOReader reader, ParseDesc desc)

    cpdef parse_enc_rows(self, BytesIOReader reader, ParseDesc desc)

@mykaul
Copy link
Copy Markdown
Author

mykaul commented Jan 6, 2026

numpy_parser.pyx, parsing.pyx also needs to be fixed

It wasn't clear to me what needs to fixing there: numpy - it doesn't use/call column_encryption parsing.pyx - there's just 'not implemented' functions there.

Instead of changing only TupleRowParser class you need to do similar changes to ColumnParser and RowParser and all their implementations: ListParser, LazyParser and NumpyParser, you will discover that NumpyParser disregards column_encryption_policy which means that it does not work properly when when columns are encrypted, have a plug there that throwing NotImplementedError. In total all Parser implementation classes should contain implementations for:

Let me see if I understand this:

  • TupleRowParser - Implemented unpack_ce_row() and unpack_row()
  • ColumnParser - its parse_rows() creates a TupleRowParser object - and then I call its unpack_ce_row() / unpack_row()
  • RowParser- its unpack_row() is raising NotImplementedError - I'm not sure I should change this in this PR?
  • ListParser - its parse_rows() creates a TupleRowParser object - and then I call its unpack_ce_row() / unpack_row()
  • LazyParser - its parse_rows() calls parse_rows_lazy() - which creates a TupleRowParser object - and then I call its unpack_ce_row() / unpack_row()
  • NumpyParser its parse_rows() calls its _parse_rows() which completely ignores encryption and calls unpack_row() - but I have no idea where it's implemented. BUT - I'm not introducing a regression here, no?
  • BoundStatement.bind() - added in the last commit in the series.

Is that an accurate representation?

    cpdef parse_plain_rows(self, BytesIOReader reader, ParseDesc desc)

    cpdef parse_enc_rows(self, BytesIOReader reader, ParseDesc desc)

I'm not sure I understand the interaction between Python and Cython - I was not going to change the interfaces whatsoever.

While Abstract classes:

cdef class ColumnParser:
    cpdef parse_rows(self, BytesIOReader reader, ParseDesc desc):
        if desc.column_encryption_policy:
            return self.unpack_enc_row(reader, desc)
        return self.unpack_plain_row(reader, desc)

    cpdef parse_plain_rows(self, BytesIOReader reader, ParseDesc desc)

    cpdef parse_enc_rows(self, BytesIOReader reader, ParseDesc desc)

You raise a good point about the names of the functions - that can clearly be more clear with 'plain' and 'encrypted' in them! I'll fix that.

@mykaul
Copy link
Copy Markdown
Author

mykaul commented Jan 27, 2026

You raise a good point about the names of the functions - that can clearly be more clear with 'plain' and 'encrypted' in them! I'll fix that.

Done in latest commit. I think it's ready for re-review.

@mykaul
Copy link
Copy Markdown
Author

mykaul commented Jan 27, 2026

CI failure seems like known issue #446

@mykaul mykaul added the enhancement New feature or request label Feb 8, 2026
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Mar 2, 2026

Can this be reviewed? In my (limited) testing, it did show some nice improvement. It's just a branch or so, but apparently, was impacting to some degree the performance.

mykaul added a commit to mykaul/python-driver that referenced this pull request Mar 14, 2026
…nt.bind()

When Cython serializers (from cassandra.serializers) are available and no
column encryption policy is active, BoundStatement.bind() now uses
pre-built Serializer objects cached on the PreparedStatement instead of
calling cqltype classmethods. This avoids per-value Python method dispatch
overhead and enables the ~30x vector serialization speedup from the Cython
serializers module.

The bind loop is split into three paths:
1. Column encryption policy path (unchanged behavior)
2. Cython serializers path (new fast path)
3. Plain Python path (no CE, no Cython -- removes per-value ColDesc/CE check)

Depends on PR scylladb#748 (Cython serializers module) and PR scylladb#630 (CE-policy
bind split).
@mykaul mykaul self-assigned this Mar 16, 2026
mykaul added a commit to mykaul/python-driver that referenced this pull request Mar 16, 2026
…nt.bind()

When Cython serializers (from cassandra.serializers) are available and no
column encryption policy is active, BoundStatement.bind() now uses
pre-built Serializer objects cached on the PreparedStatement instead of
calling cqltype classmethods. This avoids per-value Python method dispatch
overhead and enables the ~30x vector serialization speedup from the Cython
serializers module.

The bind loop is split into three paths:
1. Column encryption policy path (unchanged behavior)
2. Cython serializers path (new fast path)
3. Plain Python path (no CE, no Cython -- removes per-value ColDesc/CE check)

Depends on PR scylladb#748 (Cython serializers module) and PR scylladb#630 (CE-policy
bind split).
mykaul added a commit to mykaul/python-driver that referenced this pull request Mar 20, 2026
…nt.bind()

When Cython serializers (from cassandra.serializers) are available and no
column encryption policy is active, BoundStatement.bind() now uses
pre-built Serializer objects cached on the PreparedStatement instead of
calling cqltype classmethods. This avoids per-value Python method dispatch
overhead and enables the ~30x vector serialization speedup from the Cython
serializers module.

The bind loop is split into three paths:
1. Column encryption policy path (unchanged behavior)
2. Cython serializers path (new fast path)
3. Plain Python path (no CE, no Cython -- removes per-value ColDesc/CE check)

Depends on PR scylladb#748 (Cython serializers module) and PR scylladb#630 (CE-policy
bind split).
mykaul added a commit to mykaul/python-driver that referenced this pull request Mar 25, 2026
Split recv_results_rows into fast path (no column encryption) and slow
path (column encryption enabled):

Fast path (common case):
  - Reads raw column bytes and decodes types in a single pass per row
    via _decode_row_inline(), eliminating the intermediate list-of-lists
  - Skips ColDesc namedtuple creation entirely (only needed for CE)
  - No closure allocation per call

Slow path (column encryption):
  - Preserves full CE logic with ColDesc creation
  - Moves decode_val/decode_row closures to module-level functions
    (_decode_val_ce, _decode_row_ce) to avoid per-call closure overhead

Note: This PR modifies the same method as PR scylladb#630 (which also splits
recv_results_rows into CE/non-CE branches). There will be a merge
conflict that needs manual resolution if both PRs are accepted.
@mykaul mykaul force-pushed the remove_enc_cols branch 2 times, most recently from 1e272ed to 2b02175 Compare April 2, 2026 08:56
mykaul added a commit to mykaul/python-driver that referenced this pull request Apr 3, 2026
…nt.bind()

When Cython serializers (from cassandra.serializers) are available and no
column encryption policy is active, BoundStatement.bind() now uses
pre-built Serializer objects cached on the PreparedStatement instead of
calling cqltype classmethods. This avoids per-value Python method dispatch
overhead and enables the ~30x vector serialization speedup from the Cython
serializers module.

The bind loop is split into three paths:
1. Column encryption policy path (unchanged behavior)
2. Cython serializers path (new fast path)
3. Plain Python path (no CE, no Cython -- removes per-value ColDesc/CE check)

Depends on PR scylladb#748 (Cython serializers module) and PR scylladb#630 (CE-policy
bind split).
mykaul added a commit to mykaul/python-driver that referenced this pull request Apr 7, 2026
…nt.bind()

When Cython serializers (from cassandra.serializers) are available and no
column encryption policy is active, BoundStatement.bind() now uses
pre-built Serializer objects cached on the PreparedStatement instead of
calling cqltype classmethods. This avoids per-value Python method dispatch
overhead and enables the ~30x vector serialization speedup from the Cython
serializers module.

The bind loop is split into three paths:
1. Column encryption policy path (unchanged behavior)
2. Cython serializers path (new fast path)
3. Plain Python path (no CE, no Cython -- removes per-value ColDesc/CE check)

Depends on PR scylladb#748 (Cython serializers module) and PR scylladb#630 (CE-policy
bind split).
mykaul added a commit to mykaul/python-driver that referenced this pull request Apr 7, 2026
…nt.bind()

When Cython serializers (from cassandra.serializers) are available and no
column encryption policy is active, BoundStatement.bind() now uses
pre-built Serializer objects cached on the PreparedStatement instead of
calling cqltype classmethods. This avoids per-value Python method dispatch
overhead and enables the ~30x vector serialization speedup from the Cython
serializers module.

The bind loop is split into three paths:
1. Column encryption policy path (unchanged behavior)
2. Cython serializers path (new fast path)
3. Plain Python path (no CE, no Cython -- removes per-value ColDesc/CE check)

Depends on PR scylladb#748 (Cython serializers module) and PR scylladb#630 (CE-policy
bind split).
mykaul added a commit to mykaul/python-driver that referenced this pull request Apr 7, 2026
Split recv_results_rows into fast path (no column encryption) and slow
path (column encryption enabled):

Fast path (common case):
  - Reads raw column bytes and decodes types in a single pass per row
    via _decode_row_inline(), eliminating the intermediate list-of-lists
  - Skips ColDesc namedtuple creation entirely (only needed for CE)
  - No closure allocation per call

Slow path (column encryption):
  - Preserves full CE logic with ColDesc creation
  - Moves decode_val/decode_row closures to module-level functions
    (_decode_val_ce, _decode_row_ce) to avoid per-call closure overhead

Note: This PR modifies the same method as PR scylladb#630 (which also splits
recv_results_rows into CE/non-CE branches). There will be a merge
conflict that needs manual resolution if both PRs are accepted.
@mykaul mykaul changed the title Optimize column_encryption_policy checks in recv_results_rows perf: Optimize column_encryption_policy checks in recv_results_rows (1.7x speedup - ms improvements) Apr 7, 2026
mykaul added a commit to mykaul/python-driver that referenced this pull request Apr 9, 2026
Split recv_results_rows into fast path (no column encryption) and slow
path (column encryption enabled):

Fast path (common case):
  - Reads raw column bytes and decodes types in a single pass per row
    via _decode_row_inline(), eliminating the intermediate list-of-lists
  - Skips ColDesc namedtuple creation entirely (only needed for CE)
  - No closure allocation per call
  - Wraps decode errors with column name/type info for diagnostics

Slow path (column encryption):
  - Preserves full CE logic with ColDesc creation
  - Moves decode_val/decode_row closures to module-level functions
    (_decode_val_ce, _decode_row_ce) to avoid per-call closure overhead

Note: This PR modifies the same method as PR scylladb#630 (which also splits
recv_results_rows into CE/non-CE branches). There will be a merge
conflict that needs manual resolution if both PRs are accepted.
mykaul added 5 commits April 9, 2026 21:11
Split recv_results_rows() into two code paths: one for when
column_encryption_policy is set and one for the common case without it.
This avoids the per-value 'ce_policy and ce_policy.contains_column(...)'
check, which is redundant for the vast majority of queries.
Split TupleRowParser.unpack_row() into unpack_plain_row() and
unpack_col_encrypted_row(), branching once in the callers (ListParser,
LazyParser, row_parser error path) rather than per value.

- Add boundscheck/wraparound Cython optimizations on both methods
- Extract try/except outside the per-column loop
- NumpyParser.parse_rows() raises NotImplementedError when column
  encryption is configured, since it has no decryption support
- Update parsing.pxd declarations to match
Apply the same split-path optimization to the bind() method, checking
for column_encryption_policy once rather than per parameter.
Cover both the pure-Python path (ResultMessage.recv_results_rows) and
the Cython fast path (ListParser, NumpyParser) with and without
column encryption policy.
…licy

When column_encryption_policy is None (the common case), the ColDesc
namedtuple list is never accessed by unpack_plain_row(). Skip building
it entirely, passing None to ParseDesc instead.

This avoids N namedtuple allocations per query (where N = number of
columns) on the Cython fast path for non-encrypted workloads.
@mykaul mykaul force-pushed the remove_enc_cols branch from a4498d2 to 3ec0c94 Compare April 9, 2026 18:31
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 9, 2026

Rebased to origin/master + Pure-Python benchmark

Rebased cleanly onto current origin/master (no conflicts). All 16 protocol unit tests + 112 other unit tests pass.

Pure-Python recv_results_rows decode benchmark

The original PR description covers Cython ListParser performance. Here are the pure-Python path numbers (10,000 rows, no column_encryption_policy, best of 5 runs, CPU pinned to single core):

Config                        Master (ms)    PR (ms)  Speedup   Master rows/s       PR rows/s
----------------------------------------------------------------------------------------------------
2 cols (Int32+UTF8)               15.61 ms    12.78 ms    1.22x       640,768/s       782,702/s
10 cols (Int32+UTF8)              41.32 ms    34.21 ms    1.21x       241,999/s       292,289/s
2 cols (all Int32)                13.54 ms    11.26 ms    1.20x       738,461/s       888,109/s
10 cols (all Int32)               41.01 ms    34.46 ms    1.19x       243,837/s       290,209/s

Consistent 1.19–1.22x speedup on the pure-Python path by eliminating:

  • ColDesc namedtuple allocation per column (N allocations per query)
  • Per-value column_encryption_policy and ce_policy.contains_column(coldesc) truthiness check
  • Per-value col_desc lookup from the col_descs list
  • 3-way zip(row, column_metadata, col_descs) → 2-way zip(row, column_metadata)

Environment: Python 3.14.3, no Cython .so for protocol.py.

@mykaul mykaul changed the title perf: Optimize column_encryption_policy checks in recv_results_rows (1.7x speedup - ms improvements) perf: Optimize column_encryption_policy checks in recv_results_rows (1.7x speedup for Cython, 1.2x for Python) Apr 9, 2026
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 9, 2026

Updated benchmarks — Cython and Pure-Python paths

All benchmarks: 10,000 rows, no column_encryption_policy (common case), best of 5 runs, CPU pinned to single core (core 15), 1000 iters (Cython) / 200 iters (pure-Python), with warm-up. Python 3.14.3, Cython 3.2.4.

Cython ListParser (obj_parser.pyx + row_parser.pyx)

Config                       Master ns/col-row    PR ns/col-row  Speedup
-------------------------------------------------------------------------
2 cols (Int32+UTF8)                 91.4 ns           81.2 ns     1.13x
10 cols (Int32+UTF8)                88.0 ns           74.1 ns     1.19x
2 cols (all Int32)                  58.6 ns           45.1 ns     1.30x
10 cols (all Int32)                 38.5 ns           28.4 ns     1.36x

Pure-Python (protocol.py recv_results_rows)

Config                       Master ns/col-row    PR ns/col-row  Speedup
-------------------------------------------------------------------------
2 cols (Int32+UTF8)                788.7 ns          649.4 ns     1.21x
10 cols (Int32+UTF8)               432.5 ns          355.1 ns     1.22x
2 cols (all Int32)                 680.8 ns          564.5 ns     1.21x
10 cols (all Int32)                449.0 ns          358.2 ns     1.25x

Summary

Path Speedup range
Cython ListParser 1.13–1.36x
Pure-Python decode 1.21–1.25x

Cython gains are larger for Int32-only workloads (up to 1.36x) because the per-column overhead eliminated by this PR is a bigger fraction of the total when the deserialization itself is cheap (Int32 = single struct.unpack). Mixed Int32+UTF8 workloads show a smaller but still significant improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

column_encryption_policy is checked in reading every single value in decode_val()

2 participants