Skip to content

feat: NumPy-accelerated vector serialization in VectorType (hugs improvements in some use cases)#793

Draft
mykaul wants to merge 3 commits intoscylladb:masterfrom
mykaul:numpy-vector-serialize
Draft

feat: NumPy-accelerated vector serialization in VectorType (hugs improvements in some use cases)#793
mykaul wants to merge 3 commits intoscylladb:masterfrom
mykaul:numpy-vector-serialize

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Apr 5, 2026

Summary

  • Add three fast serialization paths to VectorType for fixed-size numeric subtypes (float, double, int, bigint), delivering 70–300× speedups for high-dimensional vector inserts
  • Add serialize_numpy_bulk() classmethod for batch workloads (e.g. loading embeddings from Parquet files in VectorDBBench)
  • NumPy remains optional; all new code is guarded by try/except ImportError

Motivation

The VectorDBBench use case (https://github.com/scylladb/VectorDBBench) pipelines data as Parquet → PyArrow/NumPy → wire. The existing VectorType.serialize() performed 768+ individual struct.pack('>f', val) + BytesIO.write() calls per vector, making serialization the bottleneck for bulk inserts.

What's new

Three fast paths in VectorType.serialize():

  1. bytes/bytearray passthrough – if the caller already holds a correctly-sized blob (e.g. from serialize_numpy_bulk()), return it directly with zero conversion
  2. NumPy ndarray fast path – for a 1-D ndarray, byte-swap to big-endian via np.asarray(v, dtype='>f4').tobytes() instead of per-element struct.pack
  3. serialize_numpy_bulk() classmethod – byte-swap an entire 2-D array (N_rows × dim) once and slice the raw buffer into a list[bytes]

Benchmark results

768-dim float32, 100-row batches (Python 3.14, NumPy 2.3, best of 3):

Method ns/call µs/row Speedup
list (baseline) 6,196,012 61.96
numpy (per-row) 85,245 0.85 73×
bulk (serialize_numpy_bulk) 41,239 0.41 150×
bytes passthrough 17,980 0.18 345×
bulk+passthrough (end-to-end) 60,216 0.60 103×

In absolute terms: serializing 100 × 768-dim float32 vectors drops from ~6.2 ms (baseline) to ~60 µs (bulk+passthrough end-to-end) — a 103× improvement. The bottleneck was 76,800 individual struct.pack('>f', val) + BytesIO.write() calls (768 elements × 100 rows); NumPy replaces all of that with a single array byte-swap + buffer slice.

768-dim float32, single vector (latency per call):

Method ns/call Speedup
list (baseline) 65,181
numpy (per-row) 963 68×
bytes passthrough 237 275×

Per-vector latency: a single 768-dim vector goes from ~65 µs to under 1 µs via the NumPy path, or 237 ns if pre-serialized bytes are reused.

Safety

  • Variable-size subtypes (smallint, tinyint, text) are excluded from the fast path and continue to use the original element-by-element serialization
  • Passing raw bytes for a variable-size subtype now raises TypeError explicitly (rather than silently falling through)
  • _numpy_dtype is only set when subtype.serial_size() is not None and the typename is in the 4-entry _NUMPY_DTYPE_MAP
  • Comprehensive unit tests (97 pass) cover correctness, round-trip fidelity, error handling, and fallback behavior

Commits

  1. feat: NumPy-accelerated vector serialization – all code + 97 unit tests
  2. bench: add vector NumPy serialization benchmark – standalone benchmark script with ns/call column
  3. docs: document NumPy-accelerated vector serialization – usage examples in docs/performance.rst

@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 5, 2026

CC @swasik

@mykaul mykaul marked this pull request as draft April 5, 2026 06:52
@mykaul mykaul force-pushed the numpy-vector-serialize branch 2 times, most recently from af88c0e to fba36f3 Compare April 5, 2026 12:51
mykaul added 3 commits April 5, 2026 16:05
Add three fast paths to VectorType.serialize() for the common case of
fixed-size numeric subtypes (float, double, int, bigint):

1. bytes/bytearray passthrough – skip all conversion when the caller
   already holds a correctly-sized blob (e.g. from serialize_numpy_bulk).

2. NumPy ndarray fast path – convert a 1-D numpy array to big-endian
   bytes via asarray(dtype=...).tobytes() instead of 768+ individual
   struct.pack + BytesIO.write calls.

3. serialize_numpy_bulk() classmethod – byte-swap an entire 2-D array
   (N rows × dim columns) once and slice the raw buffer, yielding one
   bytes object per row with zero per-element overhead.

Benchmarks on 768-dim float32 vectors show 70-300× speedups depending
on the path, directly benefiting bulk-insert workloads such as loading
embeddings from Parquet files (VectorDBBench use case).

NumPy remains an optional dependency; all new code is guarded by
try/except ImportError.  Variable-size subtypes (smallint, tinyint,
text, etc.) are excluded and continue to use the original
element-by-element path.

Unit tests cover correctness, round-trip fidelity, error handling, and
fallback behavior for all three paths.
Standalone benchmark comparing the four serialization paths for
VectorType across dimensions (128, 768, 1536) and batch sizes
(1, 100, 10000):

  - list (element-by-element)   – baseline
  - numpy (per-row ndarray)     – single-row fast path
  - bulk (serialize_numpy_bulk) – batch fast path
  - bytes passthrough (bind)    – pre-serialized blob

Includes auto-calibrated iteration counts and correctness verification.
Add a new section to docs/performance.rst covering the three fast
serialization paths (single-row ndarray, bulk serialize_numpy_bulk,
bytes passthrough) with usage examples and supported subtype list.
@mykaul mykaul force-pushed the numpy-vector-serialize branch from fba36f3 to 9e4bf4a Compare April 5, 2026 13:06
@mykaul mykaul changed the title feat: NumPy-accelerated vector serialization in VectorType feat: NumPy-accelerated vector serialization in VectorType (hugs improvements in some use cases) Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant