perf: optimize time-series write/read hot paths (10's to 100's of ns savings, 2-25x better)#798
Draft
mykaul wants to merge 7 commits intoscylladb:masterfrom
Draft
perf: optimize time-series write/read hot paths (10's to 100's of ns savings, 2-25x better)#798mykaul wants to merge 7 commits intoscylladb:masterfrom
mykaul wants to merge 7 commits intoscylladb:masterfrom
Conversation
Author
Follow-up: Memoize
|
| Type | Uncached | Cached | Speedup |
|---|---|---|---|
Int32Type (simple) |
157 ns | 23 ns | 6.9x |
MapType<text, int> |
464 ns | 20 ns | 22.9x |
SetType<float> |
371 ns | 20 ns | 18.2x |
ListType<double> |
357 ns | 21 ns | 17.3x |
TupleType<int, text, bool> |
509 ns | 20 ns | 25.0x |
MapType<text, list<tuple<int, float, double>>> |
636 ns | 56 ns | 11.4x |
Testing
- 607 unit tests passed (10.2s)
- No regressions
Author
Follow-up: Skip ColDesc creation in bind() when column encryption is disabledCommit: 3015986 What changedSplit
Benchmark results (Python 3.14, 200k iterations, inner loop)
Testing
|
6f057f4 to
3015986
Compare
…mestamps, bind Standalone benchmark covering the hot paths for time-series write/read workloads. Establishes baselines before optimization: DateType.serialize (datetime 2025): ~1020 ns/call DateType.deserialize (2025): ~695 ns/call varint_pack (medium): ~643 ns/call varint_unpack (medium): ~1086 ns/call MonotonicTimestampGenerator: ~374 ns/call BoundStatement.bind (5-col): ~4027 ns/call
… in DateType.serialize Eliminate the intermediate struct_time allocation in DateType.serialize() and Encoder.cql_encode_datetime() by using direct timedelta arithmetic instead of calendar.timegm(v.utctimetuple()). The old code allocated a 9-field time.struct_time named tuple via utctimetuple(), then calendar.timegm() disassembled it back to an epoch integer. The new code subtracts the epoch datetime directly to get a timedelta, then extracts days/seconds/microseconds as integers — zero intermediate object allocations. Handles both naive (treated as UTC) and timezone-aware datetimes. DateType.serialize datetime: 1022 -> 232 ns/call (4.4x faster) DateType.serialize date: 1369 -> 471 ns/call (2.9x faster) BoundStatement.bind (5-col): 4027 -> 3073 ns/call (1.3x faster)
…bytes Replace the manual string-formatting hex conversion in varint_unpack() and the byte-by-byte bytearray loop in varint_pack() with Python 3 builtins int.from_bytes() and int.to_bytes(). varint_unpack used '%02x' formatting per byte, str.join, then int(..., 16) to parse back — O(n) string allocations. int.from_bytes is a single C-level call. varint_pack used a while loop appending individual bytes to a bytearray, then reversing. int.to_bytes computes the result in one C call. Also fixes the Cython path in cython_marshal.pyx which had the same slow pattern with a TODO comment to optimize. Adapted from PR scylladb#689 (varint_unpack) with new varint_pack implementation. varint_pack medium: 643 -> 90 ns/call (7.1x faster) varint_pack large: 1109 -> 96 ns/call (11.6x faster) varint_unpack medium: 1086 -> 115 ns/call (9.4x faster) varint_unpack large: 1940 -> 146 ns/call (13.3x faster)
… and integer arithmetic Replace int(time.time() * 1e6) with time.time_ns() // 1000 to avoid float precision loss for timestamps far from epoch. Remove unnecessary lock acquisition in __init__ (no other thread can see the object yet). Use integer literal 1_000_000 instead of float 1e6 in _maybe_warn threshold/interval comparisons. Update tests to mock time_ns and use nanosecond input values.
Restore serializers.pyx/pxd from the PR scylladb#748 branch and add SerDateType that serializes datetime/date/numeric values to 8-byte big-endian int64 millisecond timestamps entirely in C, avoiding Python-level struct.pack. Uses the same timedelta arithmetic as the pure-Python DateType.serialize (Item B) but with C-level int64 byte-swapping. Benchmark shows ~1.5x speedup over the already-optimized Python path for datetime serialization (253 ns vs 381 ns per call).
Cache the computed CQL type string in a _cql_type_str class attribute. The string is computed lazily on first call and returned from cache on subsequent calls. Since type classes are immutable after apply_parameters(), no invalidation logic is needed. All 6 cql_parameterized_type() overrides are covered: _CassandraType (base), TupleType, UserType, CompositeType, DynamicCompositeType, VectorType. Benchmark (500k iters, Python 3.14): Int32Type (simple): 6.9x (157 -> 23 ns) MapType<text, int>: 22.9x (464 -> 20 ns) SetType<float>: 18.2x (371 -> 20 ns) ListType<double>: 17.3x (357 -> 21 ns) TupleType<int,text,bool>: 25.0x (509 -> 20 ns) Nested map/list/tuple: 11.4x (636 -> 56 ns)
Split BoundStatement.bind() into two code paths: when column_encryption_policy is None (the overwhelmingly common case), skip ColDesc namedtuple creation, ce_policy.contains_column() check, and ce_policy.column_type() lookup per column. Call col_spec.type.serialize() directly instead. When column encryption IS enabled, behavior is unchanged. Benchmark (inner loop only, 200k iters, Python 3.14): 3-col: 1375 -> 523 ns (2.63x, saving 852 ns/bind) 5-col: 2226 -> 1013 ns (2.20x, saving 1213 ns/bind) 8-col: 3495 -> 1317 ns (2.65x, saving 2178 ns/bind)
3015986 to
35ee857
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Optimize the serialization and deserialization hot paths most relevant to time-series workloads. Each commit is an independent, benchmarked improvement.
Changes
Commit 1: Microbenchmark suite
benchmarks/bench_timeseries.py— standalonetimeit-based microbenchmarks coveringDateType.serialize/deserialize,varint_pack/unpack,MonotonicTimestampGenerator, andBoundStatement.bind()for a 5-column time-series schema.Commit 2: DateType.serialize — replace calendar.timegm with integer arithmetic
calendar.timegm(v.utctimetuple())withtimedeltaarithmetic incqltypes.pyandencoder.py, eliminating the costlystruct_timeallocation.Commit 3: varint_pack/varint_unpack — use int.to_bytes/int.from_bytes
marshal.pyandcython_marshal.pyxwith Python 3 builtinsint.to_bytes()/int.from_bytes().Commit 4: MonotonicTimestampGenerator — use time.time_ns()
int(time.time() * 1e6)withtime.time_ns() // 1000for exact microsecond precision and pure integer arithmetic.__init__.Commit 5: Cython-accelerated SerDateType timestamp serializer
serializers.pyx/.pxdfrom PR (improvement) serializers: add Cython-optimized serialization for VectorType (up to 30x faster - 823us -> 4us!) #748 branch.SerDateTypethat serializes datetime/date values to 8-byte big-endian int64 timestamps using C-level byte-swapping.TimestampType→SerDateTypemapping soTimestampType(subclass ofDateType) correctly uses the Cython fast path.Benchmark Results (Python 3.14.3)
DateType.serialize (Python path)
SerDateType (Cython path)
Note: Master baseline used a stale pre-built
.so— the.pyx/.pxdsources did not exist on master. The main win from Commit 5 is thatserializers.pyxis now source-tracked andTimestampTypecorrectly resolves to the Cython fast path.varint_pack / varint_unpack
MonotonicTimestampGenerator
End-to-end: BoundStatement.bind (5-col time-series row)
Testing