feat: NumPy-accelerated vector serialization in VectorType (hugs improvements in some use cases)#793
Draft
mykaul wants to merge 3 commits intoscylladb:masterfrom
Draft
feat: NumPy-accelerated vector serialization in VectorType (hugs improvements in some use cases)#793mykaul wants to merge 3 commits intoscylladb:masterfrom
mykaul wants to merge 3 commits intoscylladb:masterfrom
Conversation
Author
|
CC @swasik |
af88c0e to
fba36f3
Compare
Add three fast paths to VectorType.serialize() for the common case of fixed-size numeric subtypes (float, double, int, bigint): 1. bytes/bytearray passthrough – skip all conversion when the caller already holds a correctly-sized blob (e.g. from serialize_numpy_bulk). 2. NumPy ndarray fast path – convert a 1-D numpy array to big-endian bytes via asarray(dtype=...).tobytes() instead of 768+ individual struct.pack + BytesIO.write calls. 3. serialize_numpy_bulk() classmethod – byte-swap an entire 2-D array (N rows × dim columns) once and slice the raw buffer, yielding one bytes object per row with zero per-element overhead. Benchmarks on 768-dim float32 vectors show 70-300× speedups depending on the path, directly benefiting bulk-insert workloads such as loading embeddings from Parquet files (VectorDBBench use case). NumPy remains an optional dependency; all new code is guarded by try/except ImportError. Variable-size subtypes (smallint, tinyint, text, etc.) are excluded and continue to use the original element-by-element path. Unit tests cover correctness, round-trip fidelity, error handling, and fallback behavior for all three paths.
Standalone benchmark comparing the four serialization paths for VectorType across dimensions (128, 768, 1536) and batch sizes (1, 100, 10000): - list (element-by-element) – baseline - numpy (per-row ndarray) – single-row fast path - bulk (serialize_numpy_bulk) – batch fast path - bytes passthrough (bind) – pre-serialized blob Includes auto-calibrated iteration counts and correctness verification.
Add a new section to docs/performance.rst covering the three fast serialization paths (single-row ndarray, bulk serialize_numpy_bulk, bytes passthrough) with usage examples and supported subtype list.
fba36f3 to
9e4bf4a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
VectorTypefor fixed-size numeric subtypes (float,double,int,bigint), delivering 70–300× speedups for high-dimensional vector insertsserialize_numpy_bulk()classmethod for batch workloads (e.g. loading embeddings from Parquet files in VectorDBBench)try/except ImportErrorMotivation
The VectorDBBench use case (https://github.com/scylladb/VectorDBBench) pipelines data as
Parquet → PyArrow/NumPy → wire. The existingVectorType.serialize()performed 768+ individualstruct.pack('>f', val)+BytesIO.write()calls per vector, making serialization the bottleneck for bulk inserts.What's new
Three fast paths in
VectorType.serialize():serialize_numpy_bulk()), return it directly with zero conversionnp.asarray(v, dtype='>f4').tobytes()instead of per-element struct.packserialize_numpy_bulk()classmethod – byte-swap an entire 2-D array(N_rows × dim)once and slice the raw buffer into alist[bytes]Benchmark results
768-dim float32, 100-row batches (Python 3.14, NumPy 2.3, best of 3):
In absolute terms: serializing 100 × 768-dim float32 vectors drops from ~6.2 ms (baseline) to ~60 µs (bulk+passthrough end-to-end) — a 103× improvement. The bottleneck was 76,800 individual
struct.pack('>f', val)+BytesIO.write()calls (768 elements × 100 rows); NumPy replaces all of that with a single array byte-swap + buffer slice.768-dim float32, single vector (latency per call):
Per-vector latency: a single 768-dim vector goes from ~65 µs to under 1 µs via the NumPy path, or 237 ns if pre-serialized bytes are reused.
Safety
smallint,tinyint, text) are excluded from the fast path and continue to use the original element-by-element serializationbytesfor a variable-size subtype now raisesTypeErrorexplicitly (rather than silently falling through)_numpy_dtypeis only set whensubtype.serial_size() is not Noneand the typename is in the 4-entry_NUMPY_DTYPE_MAPCommits
docs/performance.rst