Skip to content

(improvement) Add __slots__ to _Frame to eliminate per-instance __dict__ (30-40ns saving, 264 bytes saved per call)#802

Draft
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/frame-slots
Draft

(improvement) Add __slots__ to _Frame to eliminate per-instance __dict__ (30-40ns saving, 264 bytes saved per call)#802
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/frame-slots

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Apr 6, 2026

Summary

Add __slots__ to the _Frame class in cassandra/connection.py. Eliminates per-instance __dict__ allocation.

Motivation

_Frame is instantiated for every response frame received from the server. It has exactly 6 fixed attributes (version, flags, stream, opcode, body_offset, end_pos) and is never monkey-patched or dynamically extended. Adding __slots__ removes the per-instance __dict__, reducing memory pressure on high-throughput workloads.

Benchmark (CPython 3.14, per-call)

Memory:

Size
Original (obj + __dict__) 48 + 296 = 344 bytes
Optimized (__slots__) 80 bytes
Savings per frame 264 bytes (76.7%)

Timing:

Operation Original Optimized Savings per call
Construction 146ns 118ns 28ns (19%)
Attribute access (4 attrs) 78ns 40ns 38ns (49%)

Bulk (10K frames):

  • 1,285,888 → 885,120 bytes = 400KB saved (31%)
  • At 1K concurrent in-flight frames: ~257KB saved, reducing GC pressure

Changes

  • cassandra/connection.py: Add __slots__ = ('version', 'flags', 'stream', 'opcode', 'body_offset', 'end_pos') to _Frame

Testing

Unit tests pass (28/28 in test_connection.py). Verified that _Frame instances no longer have __dict__.

_Frame is instantiated for every response frame received from the server.
Adding __slots__ eliminates the per-instance __dict__ allocation (~104 bytes
on CPython), reducing memory pressure on high-throughput workloads.

_Frame only has 6 fixed attributes (version, flags, stream, opcode,
body_offset, end_pos) and is never monkey-patched or dynamically extended.
@mykaul mykaul marked this pull request as draft April 6, 2026 19:26
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 6, 2026

Benchmark results (CPython 3.14, 500k iterations)

Per-instance memory:

Size
Original (obj + __dict__) 48 + 296 = 344 bytes
Optimized (__slots__) 80 bytes
Savings per frame 264 bytes (76.7%)

Per-call timing:

Operation Original Optimized Δ per call
Construction 146ns 118ns -28ns
Attribute access (4 attrs) 78ns 40ns -38ns

Bulk allocation (10k frames):

  • Original: 1,285,888 bytes → Optimized: 885,120 bytes → 400KB saved (31%)
  • At 1K concurrent in-flight frames: ~257KB saved, reducing GC pressure

_Frame has exactly 6 fixed attributes and is never dynamically extended — textbook __slots__ candidate.

@mykaul mykaul changed the title (improvement) Add __slots__ to _Frame to eliminate per-instance __dict__ (improvement) Add __slots__ to _Frame to eliminate per-instance __dict__ (30-40ns saving, 264 bytes saved per call) Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant