Skip to content

perf(store): lazy-init sortedCache in cachekv.Store#2804

Open
pdrobnjak wants to merge 3 commits intomainfrom
perf/lazy-init-sorted-cache
Open

perf(store): lazy-init sortedCache in cachekv.Store#2804
pdrobnjak wants to merge 3 commits intomainfrom
perf/lazy-init-sorted-cache

Conversation

@pdrobnjak
Copy link
Contributor

@pdrobnjak pdrobnjak commented Feb 5, 2026

Summary

  • Defer MemDB (btree + FreeList) allocation in cachekv.Store until the first iterator is actually requested
  • During EVM transaction execution, sortedCache is never accessed — only Get/Set/Delete are called, which use store.cache (sync.Map), not sortedCache
  • This eliminates ~42,000 wasted MemDB allocations per block (1000 txs × ~2 snapshots × 21 module stores) for the EVM transfer workload

Problem

Profiling the Giga OCC executor on origin/main showed:

Bottleneck CPU % Root Cause
Heap allocator lock contention (runtime.lock2) 23.0% 24 OCC workers all allocating MemDB/btree simultaneously
GC pressure (runtime.gcDrain) 21.6% Scanning 9.5 GB/s of allocations
Actual EVM execution (evmone) 11.5% The useful work

Top allocators over 30s: btree.NewFreeListG (57 GB) + tm-db.NewMemDB (70 GB) = 127 GB of the 286 GB total — all from cachekv.NewStore() eagerly creating a sortedCache that is never used.

sortedCache is only read in the iterator()dirtyItems()clearUnsortedCacheSubset() path. For EVM transfers, iteration never happens — the stores only do point reads/writes. Iteration only occurs during EndBlock (receipt flush, ante surplus, pruning) and Cosmos native transactions (gov, staking, bank queries), which account for a tiny fraction of stores created.

Changes

4 lines changed in sei-cosmos/store/cachekv/store.go:

  1. NewStore: sortedCache: nil instead of dbm.NewMemDB()
  2. Write(): store.sortedCache = nil instead of dbm.NewMemDB()
  3. New ensureSortedCache(): allocates MemDB on first use
  4. iterator(): calls ensureSortedCache() before dirtyItems()

Benchmark Results (M4 Max, EVMTransfer 1000 txs/batch)

Metric Before After Delta
TPS (median steady-state) ~7,800 ~8,200 +5%
runtime.lock2 (heap contention) 23.0% CPU 15.4% CPU -13.45s
runtime.gcDrain 21.6% CPU 19.0% CPU -6.95s
runtime.(*mheap).allocSpan 18.4% CPU 11.4% CPU -12.02s
btree.New / NewMemDB / FreeList ~11s combined ~0s eliminated
Total CPU samples (30s) 142.2s 124.9s -12%

Test plan

  • cd giga/tests && go test ./... -count=1 — all 14 tests pass
  • gofmt -s -l sei-cosmos/store/cachekv/store.go — clean
  • Benchmark: TPS improved ~5%, no regressions
  • CPU profile diff confirms allocation reduction
  • CI passes

🤖 Generated with Claude Code

Each cachekv.NewStore() was eagerly allocating a MemDB (btree + FreeList)
for its sortedCache, even though sortedCache is only used when iterators
are requested. During OCC parallel execution, each block creates ~42,000
cachekv stores (1000 txs * ~2 snapshots * 21 module stores), and for
EVM transfer workloads, none of them ever iterate — they only use
Get/Set/Delete which operate on the sync.Map cache, not sortedCache.

This was the single largest source of allocation pressure: btree.NewFreeListG
(57 GB/30s) + tm-db.NewMemDB (70 GB/30s) = 127 GB of the 286 GB total
allocation rate. All 24 OCC workers contending on the heap allocator lock
(runtime.mheap.lock) made this the #1 CPU bottleneck at 23%.

The fix defers MemDB allocation to the first iterator call via
ensureSortedCache(). For the common case (no iteration), the cost is zero.
For the rare case (EndBlock operations, Cosmos native txs), the MemDB is
allocated on demand with identical behavior.

Benchmark results (M4 Max, EVM transfer workload):
- TPS: ~7,800 -> ~8,200 median (+5%)
- runtime.lock2 (heap contention): 23.0% -> 15.4% CPU
- runtime.gcDrain: 21.6% -> 19.0% CPU
- runtime.(*mheap).allocSpan: 18.4% -> 11.4% CPU
- btree/MemDB/FreeList allocation: essentially eliminated
- Total CPU samples over 30s: 142.2s -> 124.9s (-12%)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@github-actions
Copy link

github-actions bot commented Feb 5, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedFeb 6, 2026, 3:29 PM

@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

❌ Patch coverage is 77.77778% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.57%. Comparing base (5c26262) to head (4be37df).

Files with missing lines Patch % Lines
sei-cosmos/store/cachekv/store.go 77.77% 0 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2804      +/-   ##
==========================================
- Coverage   56.58%   56.57%   -0.01%     
==========================================
  Files        2033     2033              
  Lines      166263   166267       +4     
==========================================
- Hits        94074    94065       -9     
- Misses      63926    63946      +20     
+ Partials     8263     8256       -7     
Flag Coverage Δ
sei-chain 41.49% <77.77%> (-0.02%) ⬇️
sei-cosmos 48.13% <77.77%> (+<0.01%) ⬆️
sei-db 68.72% <ø> (ø)
sei-tendermint 57.91% <ø> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-cosmos/store/cachekv/store.go 88.82% <77.77%> (+0.24%) ⬆️

... and 21 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Replace ensureSortedCache() + direct field access with a
getOrInitSortedCache() getter at all 3 access sites, preventing
possible nil dereference if sortedCache is reached outside iterator().

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants