Skip to content

perf(store): lazy CacheMultiStore — defer cachekv alloc until access#2806

Draft
pdrobnjak wants to merge 3 commits intoperf/lazy-init-sorted-cachefrom
perf/lazy-cachemultistore
Draft

perf(store): lazy CacheMultiStore — defer cachekv alloc until access#2806
pdrobnjak wants to merge 3 commits intoperf/lazy-init-sorted-cachefrom
perf/lazy-cachemultistore

Conversation

@pdrobnjak
Copy link
Contributor

@pdrobnjak pdrobnjak commented Feb 5, 2026

Summary

  • Defer cachekv.NewStore allocation in CacheMultiStore until the store is actually accessed via GetKVStore/GetStore/GetGigaKVStore
  • Store parent references lazily in storeParents/gigaStoreParents maps; materialize cachekv wrappers on-demand via ensureStore()/ensureGigaStore()
  • Eliminates ~85% of cachekv allocations per block: OCC's prepareTask discards them immediately, and EVM txs only touch ~3 of 21 modules

Problem

After lazy-init sortedCache (#2804), the next largest allocator was cachekv.NewStore itself — still called eagerly for all ~21 module stores per CacheMultiStore, even though EVM transactions only access ~3 modules. OCC's prepareTask creates a CMS then immediately replaces stores with VersionIndexedStores, discarding the cachekv wrappers. Pure waste.

Profiling after #2804 (30s, M4 Max, 1000 EVM transfers/block):

Allocator alloc_space alloc_objects
cachekv.NewStore 27,330 MB top allocator
cachekv.Write (re-creates MemDB) 12,438 MB
cachemulti.newCacheMultiStoreFromCMS 43,516 MB cum
memclrNoHeapPointers #1 self CPU (16.90s)

Changes

sei-cosmos/store/cachemulti/store.go:

  1. newCacheMultiStoreFromCMS: Store parent KVStore references in storeParents / gigaStoreParents maps instead of eagerly wrapping with cachekv.NewStore
  2. ensureStore() / ensureGigaStore(): Lazy materializers that create cachekv wrapper on first access
  3. GetKVStore / GetStore / GetGigaKVStore: Call ensureStore() / ensureGigaStore() before returning

Benchmark Results (M4 Max, 1000 EVM transfers/block, 30s profile)

Metric Before (after #2804) After Delta
TPS (steady-state range) 8,000–8,400 8,000–8,800 +~5%
cachekv.NewStore alloc 27,330 MB 6,716 MB -20,614 MB
cachekv.Write alloc 12,438 MB 4,380 MB -8,058 MB
memclrNoHeapPointers CPU 16.90s (#1 self) 6.96s -9.94s
mallocgc CPU (cum) -7.23s
runtime.(*mspan).heapBitsSmallForAddr CPU 4.38s 2.10s -2.28s
Total CPU samples (30s) 98.96s 132.73s +34% more useful work

Note: total CPU samples increase because workers complete faster → more goroutines actively executing instead of blocked on allocator contention.

pprof -top -diff_base highlights (alloc_space)

+33,659 MB  cachemulti.newStoreWithoutGiga  (lazy path replaces eager path)
+26,591 MB  btree.NewFreeListG              (shifted to lazy-init from #2804)
-20,614 MB  cachekv.NewStore                (direct savings from this PR)
+13,602 MB  gaskv.NewStore                  (still allocated eagerly — addressed in #2808)

pprof -top -diff_base highlights (CPU)

+10.26s  runtime.usleep              (workers idle faster — less contention)
 -9.94s  runtime.memclrNoHeapPointers (fewer large allocs to zero)
 +6.14s  syscall.syscall             (more EVM execution reaching CGO)
 -7.23s  runtime.mallocgc cum        (less allocation overall)
 -2.28s  runtime.heapBitsSmallForAddr (less heap bookkeeping)

Test plan

  • All 14 giga tests pass (cd giga/tests && go test ./... -count=1)
  • All 3 cachemulti tests pass (cd sei-cosmos/store/cachemulti && go test ./... -count=1)
  • gofmt -s -l clean
  • pprof diff confirms allocation reduction
  • CI passes

🤖 Generated with Claude Code

…l access

CacheMultiStore previously created cachekv.NewStore wrappers eagerly for
all ~21 module stores on every snapshot. This was wasteful because:

1. The OCC scheduler's prepareTask creates a CMS then immediately replaces
   all stores with VersionIndexedStores via SetKVStores — the cachekv
   stores allocated in CMS were discarded unused.
2. EVM transactions only access ~3 of 21 modules, so ~85% of cachekv
   stores per statedb snapshot were never touched.

This change defers cachekv.NewStore allocation by storing parent references
in storeParents/gigaStoreParents maps and materializing cachekv wrappers
lazily on first GetKVStore/GetStore/GetGigaKVStore access.

Benchmark results (M4 Max, 1000 EVM transfers/block):
- TPS: 8,000-8,400 → 8,000-8,800 (~5% median uplift)
- cachekv.NewStore allocations: -20,614 MB (-360M objects)
- cachekv.Write allocations: -8,058 MB
- memclrNoHeapPointers CPU: -9.94s
- mallocgc CPU: -7.23s cumulative

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Feb 5, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedFeb 6, 2026, 5:20 PM

@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

❌ Patch coverage is 49.42529% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.28%. Comparing base (fd2e28d) to head (d52d4a1).
⚠️ Report is 5 commits behind head on perf/lazy-init-sorted-cache.

Files with missing lines Patch % Lines
sei-cosmos/store/cachemulti/store.go 49.42% 42 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@                       Coverage Diff                       @@
##           perf/lazy-init-sorted-cache    #2806      +/-   ##
===============================================================
+ Coverage                        46.91%   52.28%   +5.37%     
===============================================================
  Files                             1965     1030     -935     
  Lines                           160604    85199   -75405     
===============================================================
- Hits                             75341    44546   -30795     
+ Misses                           78734    36523   -42211     
+ Partials                          6529     4130    -2399     
Flag Coverage Δ
sei-chain ?
sei-cosmos 48.13% <49.42%> (-0.01%) ⬇️
sei-db 68.72% <ø> (ø)
sei-tendermint 58.08% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-cosmos/store/cachemulti/store.go 48.33% <49.42%> (-22.15%) ⬇️

... and 1444 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines +278 to +282
for k, parent := range cms.storeParents {
if _, exists := cms.stores[k]; !exists {
cms.stores[k] = handler(k, parent.(types.KVStore))
}
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
Comment on lines +284 to +286
for k := range cms.storeParents {
delete(cms.storeParents, k)
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
Comment on lines +296 to +300
for k, parent := range cms.gigaStoreParents {
if _, exists := cms.gigaStores[k]; !exists {
cms.gigaStores[k] = handler(k, parent)
}
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
Comment on lines +302 to +304
for k := range cms.gigaStoreParents {
delete(cms.gigaStoreParents, k)
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
When CacheMultiStore branches (CacheContext, CacheTxContext), lazy
parents were passed directly to the child CMS. This caused
child.Write() to write directly to the raw commit store, bypassing
the parent's caching layer — leading to state leaks (e.g., Simulate
leaking account sequence increments to deliverState).

Fix: force-materialize all lazy parents on the parent CMS before
creating the child branch. This preserves proper cachekv isolation
while keeping lazy init for the OCC hot path (SetKVStores).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pdrobnjak pdrobnjak marked this pull request as draft February 6, 2026 15:28
The lazy ensureStore/ensureGigaStore methods write to the stores and
storeParents maps on first access. When multiple goroutines share a
CMS (e.g., slashing BeginBlocker's HandleValidatorSignatureConcurrent),
concurrent calls to GetKVStore race on these map operations.

Add *sync.RWMutex with double-checked locking: the fast path (store
already materialized) takes only an RLock; the slow path (first access)
takes a full Lock. After all stores are materialized, every call hits
the RLock-only fast path with ~10-20ns overhead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +103 to +105
for k := range cms.storeParents {
cms.ensureStoreLocked(k)
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
Comment on lines +106 to +108
for k := range cms.gigaStoreParents {
cms.ensureGigaStoreLocked(k)
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant