Skip to content

(improvement) query: add Cython-aware serializer path in BoundStatement.bind() (1-26x speedup - tens/hundreds of us reduced)#749

Draft
mykaul wants to merge 3 commits intoscylladb:masterfrom
mykaul:perf/cython-bind-path
Draft

(improvement) query: add Cython-aware serializer path in BoundStatement.bind() (1-26x speedup - tens/hundreds of us reduced)#749
mykaul wants to merge 3 commits intoscylladb:masterfrom
mykaul:perf/cython-bind-path

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Mar 14, 2026

Summary

Dependencies

Benchmark

Measured with min() of timeit.repeat(repeat=7, number=50_000) on a quiet machine (load <2.5). End-to-end BoundStatement.bind() comparing Cython serializer path vs Python fallback path.

Workload Python fallback (ns/call) Cython path (ns/call) Speedup
Vector<float, 128> (1 col) 11460 2847 4.0x
Vector<float, 1536> (1 col) 112896 10397 10.9x
10 mixed scalars (5 float + 3 int32 + 2 text) 2174 3748 0.6x (overhead)

Key observations:

  • Vector columns get significant speedups (4-11x) because the Cython serializer replaces the per-element io.BytesIO loop with a pre-allocated C buffer and inline byte-swap
  • Scalar-only workloads are slightly slower due to Serializer dispatch overhead -- the generic fallback still calls cqltype.serialize() under the hood. The Cython path is only used when _serializers is cached, so mixed-scalar workloads fall back to the Python path automatically if not desired
  • Speedup scales with vector dimension

Tests

  • tests/unit/test_parameter_binding.py: comprehensive bind path tests covering Cython path, Python fallback, error wrapping, UNSET_VALUE handling
  • Full unit test suite passes (740 passed, 43 skipped)

@mykaul mykaul marked this pull request as draft March 14, 2026 11:22
@mykaul mykaul requested a review from Copilot March 14, 2026 19:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an optional, Cython-accelerated serialization path for BoundStatement.bind() to reduce per-value overhead (especially for large VectorType columns) when Cython serializers are available and no column encryption policy is enabled.

Changes:

  • Add a new Cython cassandra.serializers extension (with .pyx + .pxd) providing Serializer implementations and a make_serializers() factory.
  • Add lazy caching of per-column serializer objects on PreparedStatement via a _serializers property.
  • Split the bind loop into three branches: column encryption policy, Cython fast path, and pure-Python fallback (with reduced per-value overhead in the fallback).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
cassandra/serializers.pyx Adds Cython serializer implementations (scalar + VectorType) and lookup/factory functions.
cassandra/serializers.pxd Exposes the Serializer cdef interface for Cython usage.
cassandra/query.py Integrates the optional Cython serializer path into PreparedStatement/BoundStatement.bind() with lazy caching and branch selection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optimized parameter binding path that can leverage Cython Serializer objects (when available) to speed up BoundStatement.bind(), while preserving the existing column-encryption behavior and improving the plain-Python fallback.

Changes:

  • Add PreparedStatement._serializers lazy cache and a three-way bind loop in BoundStatement.bind() (CE policy / Cython fast path / Python fallback).
  • Introduce Cython Serializer implementations for Float/Double/Int32/Vector and a make_serializers() factory.
  • Extend unit tests to exercise the Cython-serializer bind branch via injected stub serializers.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
cassandra/query.py Adds Cython-serializer availability detection, caches serializers on PreparedStatement, and selects the new bind fast path when safe.
cassandra/serializers.pyx Implements Cython serializers (including optimized VectorType) and factory/lookup helpers.
cassandra/serializers.pxd Declares the Cython Serializer interface for cross-module typing.
tests/unit/test_parameter_binding.py Adds tests for the new bind branch and error-wrapping behavior using stub serializers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Cython-aware fast path to BoundStatement.bind() so prepared statements can use cached per-column serializer objects (when available and no column encryption policy is active), reducing Python dispatch and per-value overhead during binding.

Changes:

  • Add lazy PreparedStatement._serializers cache and split BoundStatement.bind() into CE-policy, Cython-serializer, and pure-Python paths.
  • Introduce Cython Serializer implementations (including optimized VectorType serialization) and a make_serializers() factory.
  • Expand/adjust unit tests to cover the new Cython bind branch via injected stub serializers.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
cassandra/query.py Adds serializer caching and a new bind fast path (plus significant formatting-only edits in the same module).
cassandra/serializers.pyx Introduces Cython serializer implementations and factory/lookup helpers.
cassandra/serializers.pxd Declares the Cython Serializer interface for cross-module use.
tests/unit/test_parameter_binding.py Adds tests to exercise the Cython bind path without requiring compiled Cython.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Cython-aware fast path to BoundStatement.bind() by reworking the bind loop into distinct branches (column encryption vs Cython serializers vs pure-Python), and introduces the Cython serializer module used by the new path.

Changes:

  • Add lazy cached PreparedStatement._serializers and update BoundStatement.bind() to use Cython Serializer objects when available and CE policy is not active.
  • Factor bind-time serialization error wrapping into a shared helper and extend wrapping to include OverflowError.
  • Add unit tests that exercise the new Cython bind branch via injected stub serializers, plus overflow wrapping coverage.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
cassandra/query.py Adds _serializers cache on PreparedStatement, a shared bind error wrapper, and splits BoundStatement.bind() into CE / Cython / Python paths.
cassandra/serializers.pyx Introduces Cython Serializer implementations (scalar + vector) and serializer factory helpers.
cassandra/serializers.pxd Exposes the Serializer cdef interface for Cython interop.
tests/unit/test_parameter_binding.py Adds unit tests to validate the new Cython bind path behavior and error wrapping.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optimized binding path that leverages Cython-backed per-column Serializer objects (when available) to reduce Python overhead in BoundStatement.bind(), while keeping existing behavior for column encryption policy and providing a streamlined pure-Python fallback.

Changes:

  • Add lazy, cached Cython serializer lookup on PreparedStatement and route BoundStatement.bind() through a 3-way path: CE-policy, Cython fast path, pure-Python fallback.
  • Introduce Cython serializer implementations (Serializer, scalar serializers, SerVectorType, and factories).
  • Extend unit tests to exercise the Cython fast-path behavior (via injected stub serializers) and UNSET handling scenarios.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
cassandra/query.py Adds optional Cython serializer import, caches per-column serializers on PreparedStatement, and refactors BoundStatement.bind() into CE/Cython/Python paths with pre-allocation.
cassandra/serializers.pyx Introduces Cython-optimized serializers (including VectorType) and factory functions for per-column serializer creation.
cassandra/serializers.pxd Declares the Cython Serializer interface for cross-module typing/cimports.
tests/unit/test_parameter_binding.py Adds tests that inject stub serializers to validate the Cython bind path and UNSET behavior across paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mykaul mykaul force-pushed the perf/cython-bind-path branch from 9aabe6b to 414f0cd Compare April 3, 2026 18:12
@mykaul mykaul changed the title (improvement) query: add Cython-aware serializer path in BoundStatement.bind() (improvement) query: add Cython-aware serializer path in BoundStatement.bind() (1-26x speedup - tens/hundreds of us reduced) Apr 7, 2026
@mykaul mykaul force-pushed the perf/cython-bind-path branch from 414f0cd to ee511ea Compare April 7, 2026 08:16
mykaul added 3 commits April 7, 2026 12:17
…torType

Add cassandra/serializers.pyx and cassandra/serializers.pxd implementing
Cython-optimized serialization that mirrors the deserializers.pyx architecture.

Implements type-specialized serializers for the three subtypes commonly used
in vector columns:
- SerFloatType: 4-byte big-endian IEEE 754 float
- SerDoubleType: 8-byte big-endian double
- SerInt32Type: 4-byte big-endian signed int32

SerVectorType pre-allocates a contiguous buffer and uses C-level byte swapping
for float/double/int32 vectors, with a generic fallback for other subtypes.
GenericSerializer delegates to the Python-level cqltype.serialize() classmethod.

Factory functions find_serializer() and make_serializers() allow easy lookup
and batch creation of serializers for column types.

Benchmarks show ~30x speedup over the current io.BytesIO baseline and ~3x
speedup over Python struct.pack for Vector<float, 1536> serialization.

No setup.py changes needed - the existing cassandra/*.pyx glob already picks
up new .pyx files.
…nt.bind()

When Cython serializers (from cassandra.serializers) are available and no
column encryption policy is active, BoundStatement.bind() now uses
pre-built Serializer objects cached on the PreparedStatement instead of
calling cqltype classmethods. This avoids per-value Python method dispatch
overhead and enables the ~30x vector serialization speedup from the Cython
serializers module.

The bind loop is split into three paths:
1. Column encryption policy path (unchanged behavior)
2. Cython serializers path (new fast path)
3. Plain Python path (no CE, no Cython -- removes per-value ColDesc/CE check)

Depends on PR scylladb#748 (Cython serializers module) and PR scylladb#630 (CE-policy
bind split).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants