Skip to content

fix(db): nested toArray includes drop children when sibling groups share a correlation key (#1501)#1607

Open
kevin-dp wants to merge 4 commits into
TanStack:mainfrom
kevin-dp:repro/1501-nested-toarray-shared-buffer
Open

fix(db): nested toArray includes drop children when sibling groups share a correlation key (#1501)#1607
kevin-dp wants to merge 4 commits into
TanStack:mainfrom
kevin-dp:repro/1501-nested-toarray-shared-buffer

Conversation

@kevin-dp

@kevin-dp kevin-dp commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Fixes #1501.

The bug

With three (or more) levels of nested toArray includes (e.g. products → priceRanges → region), when two children in different parent groups share the same deepest correlation key, only one of them receives the nested rows — the other comes back as an empty array:

T-Shirt → priceRange 1 (regionId 1) → region: []          ❌  expected [Europe]
T-Shirt → priceRange 2 (regionId 2) → region: [North America]   ✓
Hoodie  → priceRange 3 (regionId 1) → region: [Europe]          ✓

Root cause

The nested pipeline writes results into a buffer shared across per-parent-group states, and routing assumes one parent per nested correlation key. Two compounding issues:

  1. 1:1 routing + destructive drain. nestedRoutingIndex mapped each nested correlation key to a single parent group (last-writer-wins), and drainNestedBuffers deleted the buffer entry after routing to the first match — so a sibling group sharing the key was dropped.
  2. No re-emit for late arrivals. The nested pipeline does not re-emit already-materialized rows, so a parent group that starts referencing an existing correlation key after the rows were drained (e.g. a sibling inserted after the initial load) saw nothing.

Fix

  • nestedRoutingIndex now maps a nested correlation key to a Set of parent groups; drainNestedBuffers fans buffered grandchild changes out to every ready parent group before dropping the buffer entry. Routing-index inserts/deletes and parent-delete cleanup maintain the per-key parent sets.
  • A per-level cumulative snapshot of net-present grandchild rows seeds late-arriving parent groups with the rows their siblings already received.

Tests

Three tests under nested toArray includes (depth 3+) cover initial load, a sibling inserted after load, and deleting one of two siblings that share a correlation key. Full tests/query/ suite (1596 tests) passes.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Resolved a regression in nested toArray includes where sibling parent groups sharing the same deepest correlation key could produce incorrect or missing nested results.
    • Improved live nested includes behavior to correctly route and seed nested relation changes for newly-associated parents, and maintain consistency across inserts and deletes.
  • Tests
    • Expanded nested includes coverage to verify correct deepest-correlation behavior across multiple nesting levels, including live and materialized collection variants, with insertion and deletion scenarios.

…#1501)

Adds a failing test reproducing TanStack#1501: with three collection levels
(products -> priceRanges -> region), when two priceRanges in different
parent groups share the same deepest correlation key (regionId === 1),
one of the two nested `region` arrays comes back empty.

The nested pipeline buffer is shared by reference across per-parent-group
states (createPerEntryIncludesStates) and drainNestedBuffers deletes a
buffer entry after routing it to the first matching parent group, so the
sibling that drains second finds nothing.

Note: the minimal repro in the issue does not trigger the bug as written
(its dummy `eq(p.id, _.id)` correlation against a single-row anchor with
findOne collapses to one product, so the two overlapping siblings never
coexist in the output). This test puts both sibling groups in the result
so the collision actually occurs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Nested include routing now tracks multiple parents per deepest correlation key, replays drained nested rows from snapshots, and adds regression coverage for shared-key three-level includes across preload, insert, delete, live collection, and materialized variants.

Changes

Nested toArray fan-out and snapshotting

Layer / File(s) Summary
Nested snapshot state
packages/db/src/query/live/collection-config-builder.ts
Adds snapshot row tracking, snapshot storage on nested include setup, parent-set routing indexes, and snapshot initialization for each nested pipeline.
Routing and snapshot replay
packages/db/src/query/live/collection-config-builder.ts
Adds helpers to accumulate drained nested rows into snapshots, seed late parent groups, fan buffered rows out to all matching parents, and preserve shared nested routing entries during insert/update/delete handling.
Nested includes regression coverage
.changeset/nested-toarray-shared-buffer-overlap.md, packages/db/tests/query/includes.test.ts
Updates the changeset note and adds regression tests for preload, late insertion, deletion, live collection, and materialized nested includes sharing the same deepest regionId.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Suggested reviewers

  • samwillis

Poem

🐇 Three nested hops, a shared key gleams,
Buffered rows now find all streams.
Siblings keep their regions bright,
Even when the edits bite.
Hop, hop — the arrays stay true,
A tidy fix with tests anew.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is specific and accurately summarizes the main fix for nested toArray includes with shared correlation keys.
Description check ✅ Passed The description covers the bug, root cause, fix, and tests, though it doesn't follow the template headings exactly.
Linked Issues check ✅ Passed The code and tests address #1501, including shared-key fan-out, late-arriving siblings, deletions, live collections, and materialize variants.
Out of Scope Changes check ✅ Passed The changes stay focused on the bug fix, regression tests, and changeset with no unrelated behavior changes.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@pkg-pr-new

pkg-pr-new Bot commented Jun 22, 2026

Copy link
Copy Markdown
More templates

@tanstack/angular-db

npm i https://pkg.pr.new/@tanstack/angular-db@1607

@tanstack/browser-db-sqlite-persistence

npm i https://pkg.pr.new/@tanstack/browser-db-sqlite-persistence@1607

@tanstack/capacitor-db-sqlite-persistence

npm i https://pkg.pr.new/@tanstack/capacitor-db-sqlite-persistence@1607

@tanstack/cloudflare-durable-objects-db-sqlite-persistence

npm i https://pkg.pr.new/@tanstack/cloudflare-durable-objects-db-sqlite-persistence@1607

@tanstack/db

npm i https://pkg.pr.new/@tanstack/db@1607

@tanstack/db-ivm

npm i https://pkg.pr.new/@tanstack/db-ivm@1607

@tanstack/db-sqlite-persistence-core

npm i https://pkg.pr.new/@tanstack/db-sqlite-persistence-core@1607

@tanstack/electric-db-collection

npm i https://pkg.pr.new/@tanstack/electric-db-collection@1607

@tanstack/electron-db-sqlite-persistence

npm i https://pkg.pr.new/@tanstack/electron-db-sqlite-persistence@1607

@tanstack/expo-db-sqlite-persistence

npm i https://pkg.pr.new/@tanstack/expo-db-sqlite-persistence@1607

@tanstack/node-db-sqlite-persistence

npm i https://pkg.pr.new/@tanstack/node-db-sqlite-persistence@1607

@tanstack/offline-transactions

npm i https://pkg.pr.new/@tanstack/offline-transactions@1607

@tanstack/powersync-db-collection

npm i https://pkg.pr.new/@tanstack/powersync-db-collection@1607

@tanstack/query-db-collection

npm i https://pkg.pr.new/@tanstack/query-db-collection@1607

@tanstack/react-db

npm i https://pkg.pr.new/@tanstack/react-db@1607

@tanstack/react-native-db-sqlite-persistence

npm i https://pkg.pr.new/@tanstack/react-native-db-sqlite-persistence@1607

@tanstack/rxdb-db-collection

npm i https://pkg.pr.new/@tanstack/rxdb-db-collection@1607

@tanstack/solid-db

npm i https://pkg.pr.new/@tanstack/solid-db@1607

@tanstack/svelte-db

npm i https://pkg.pr.new/@tanstack/svelte-db@1607

@tanstack/tauri-db-sqlite-persistence

npm i https://pkg.pr.new/@tanstack/tauri-db-sqlite-persistence@1607

@tanstack/trailbase-db-collection

npm i https://pkg.pr.new/@tanstack/trailbase-db-collection@1607

@tanstack/vue-db

npm i https://pkg.pr.new/@tanstack/vue-db@1607

commit: 7bd39a8

kevin-dp and others added 2 commits June 24, 2026 09:52
…ation key (TanStack#1501)

With 3+ levels of nested toArray includes, when two children in different
parent groups shared the same deepest correlation key, only one received the
nested rows and the other came back empty.

Two compounding causes:
- nestedRoutingIndex mapped each nested correlation key to a single parent
  group (last-writer-wins), and the shared buffer entry was deleted after
  routing to the first match, so sibling groups sharing the key were dropped.
- the nested pipeline does not re-emit already-materialized rows, so a parent
  group that starts referencing an existing correlation key after the rows
  were drained (e.g. a sibling inserted after the initial load) saw nothing.

Fixes:
- nestedRoutingIndex now maps a nested correlation key to a Set of parent
  groups; drainNestedBuffers fans buffered grandchild changes out to every
  ready parent group before dropping the buffer entry.
- a per-level cumulative snapshot of net-present grandchild rows seeds
  late-arriving parent groups from what their siblings already received.

Routing-index inserts/deletes and parent-delete cleanup are updated to
maintain the per-key parent sets.

Adds tests covering initial load, a sibling inserted after load, and deleting
one of two siblings that share a correlation key.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kevin-dp kevin-dp changed the title test(db): failing repro for nested toArray dropped children (#1501) fix(db): nested toArray includes drop children when sibling groups share a correlation key (#1501) Jun 24, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
packages/db/tests/query/includes.test.ts (1)

5078-5079: 🩺 Stability & Availability | 🔵 Trivial

Replace the fixed 50ms sleeps with a deterministic wait helper. flushPromises() is already available in the test utilities, and the same change should be applied to the matching delete case at line 5160.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/db/tests/query/includes.test.ts` around lines 5078 - 5079, The tests
in `includes.test.ts` use fixed 50ms sleeps after `priceRanges.insert(...)`,
which should be replaced with the deterministic `flushPromises()` helper already
available in the test utilities. Update the insert case and the matching delete
case to await `flushPromises()` instead of `setTimeout`, keeping the same
behavior while removing timing flakiness.

Source: Coding guidelines

packages/db/src/query/live/collection-config-builder.ts (1)

1153-1158: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Avoid adding new any to snapshot state.

SnapshotRow.value and the new helper signature can use unknown here; the code only stores/replays the value and does not need unchecked access. As per coding guidelines, **/*.{ts,tsx}: Avoid using any types; use unknown instead when the type is truly unknown.

♻️ Proposed type tightening
 type SnapshotRow = {
-  value: any
+  value: unknown
   orderByIndex: string | undefined
   /** Net multiplicity (inserts − deletes) currently materialized for this row */
   count: number
 }
@@
 function accumulateSnapshot(
   setup: NestedIncludesSetup,
   nestedCorrelationKey: unknown,
-  childChanges: Map<unknown, Changes<any>>,
+  childChanges: Map<unknown, Changes<unknown>>,
 ): void {

Also applies to: 1373-1377

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/db/src/query/live/collection-config-builder.ts` around lines 1153 -
1158, The snapshot state is introducing an unnecessary any type in
SnapshotRow.value and the related helper signature, even though the value is
only stored and replayed. Tighten the types in collection-config-builder by
changing SnapshotRow and the affected helper used around the snapshot logic to
use unknown instead of any, and keep the existing handling unchanged except for
type-safe narrowing where needed.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/db/src/query/live/collection-config-builder.ts`:
- Around line 1387-1399: Clone snapshot values before assigning them into row
state in collection-config-builder’s snapshot/replay logic, because row.value is
currently sharing the same object as changes.value and later mutation of
changes.value can strip routing metadata. Update the row initialization and
replay paths in the affected update/merge flow (including the shared logic
around the count/orderByIndex handling) to store a cloned copy instead of the
original object, so later cleanup cannot affect previously buffered rows and
nested routing can be rebuilt correctly.

---

Nitpick comments:
In `@packages/db/src/query/live/collection-config-builder.ts`:
- Around line 1153-1158: The snapshot state is introducing an unnecessary any
type in SnapshotRow.value and the related helper signature, even though the
value is only stored and replayed. Tighten the types in
collection-config-builder by changing SnapshotRow and the affected helper used
around the snapshot logic to use unknown instead of any, and keep the existing
handling unchanged except for type-safe narrowing where needed.

In `@packages/db/tests/query/includes.test.ts`:
- Around line 5078-5079: The tests in `includes.test.ts` use fixed 50ms sleeps
after `priceRanges.insert(...)`, which should be replaced with the deterministic
`flushPromises()` helper already available in the test utilities. Update the
insert case and the matching delete case to await `flushPromises()` instead of
`setTimeout`, keeping the same behavior while removing timing flakiness.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ace73dfd-754c-45cb-8106-836a50331bd6

📥 Commits

Reviewing files that changed from the base of the PR and between b7229fc and 5d5259e.

📒 Files selected for processing (3)
  • .changeset/nested-toarray-shared-buffer-overlap.md
  • packages/db/src/query/live/collection-config-builder.ts
  • packages/db/tests/query/includes.test.ts
✅ Files skipped from review due to trivial changes (1)
  • .changeset/nested-toarray-shared-buffer-overlap.md

Comment on lines +1387 to +1399
row = {
value: changes.value,
orderByIndex: changes.orderByIndex,
count: 0,
}
snap.set(childKey, row)
}
row.count += changes.inserts - changes.deletes
if (changes.inserts > 0) {
row.value = changes.value
if (changes.orderByIndex !== undefined) {
row.orderByIndex = changes.orderByIndex
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Clone snapshot values before storing or replaying them.

row.value currently aliases changes.value; later cleanup deletes INCLUDES_ROUTING from changes.value at Line 1984. A late replay can then seed a row without routing metadata, preventing deeper nested routing from being rebuilt via Lines 1544-1547.

🐛 Proposed fix
+function cloneSnapshotValue<T>(value: T): T {
+  if (value == null || typeof value !== `object`) return value
+  if (Array.isArray(value)) return [...value] as T
+  return { ...(value as Record<PropertyKey, unknown>) } as T
+}
+
 function accumulateSnapshot(
   setup: NestedIncludesSetup,
   nestedCorrelationKey: unknown,
   childChanges: Map<unknown, Changes<any>>,
@@
       row = {
-        value: changes.value,
+        value: cloneSnapshotValue(changes.value),
         orderByIndex: changes.orderByIndex,
         count: 0,
       }
@@
     if (changes.inserts > 0) {
-      row.value = changes.value
+      row.value = cloneSnapshotValue(changes.value)
       if (changes.orderByIndex !== undefined) {
         row.orderByIndex = changes.orderByIndex
       }
@@
     byChild.set(childKey, {
       deletes: 0,
       inserts: row.count,
-      value: row.value,
+      value: cloneSnapshotValue(row.value),
       orderByIndex: row.orderByIndex,
     })

Also applies to: 1436-1443

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/db/src/query/live/collection-config-builder.ts` around lines 1387 -
1399, Clone snapshot values before assigning them into row state in
collection-config-builder’s snapshot/replay logic, because row.value is
currently sharing the same object as changes.value and later mutation of
changes.value can strip routing metadata. Update the row initialization and
replay paths in the affected update/merge flow (including the shared logic
around the count/orderByIndex handling) to store a cloned copy instead of the
original object, so later cleanup cannot affect previously buffered rows and
nested routing can be rebuilt correctly.

@kevin-dp

Copy link
Copy Markdown
Contributor Author

Here's a complete explanation of the bugs and the fix. This PR implements the snapshot accumulation fix but it means we store each distinct nested row in that snapshot. The explanation below also documents a potential alternative.

The example

products → priceRanges → region

For each product you want its price ranges, and for each price range its region. Three levels deep. Our problem data:

  • product 1 (T-Shirt) has priceRange 1 → regionId 1 (Europe)
  • product 2 (Hoodie) has priceRange 3 → regionId 1 (Europe) ← same region!

How includes actually compute (the important part)

You might imagine the engine runs the "region" query fresh for each price range. It doesn't. That would be slow and wouldn't update reactively. Instead:

There is ONE shared "region" pipeline. It runs once over all regions and emits region rows tagged with a correlation key — the value that says "who do I belong to". For region, that key is regionId. So the region pipeline emits one thing:

regionId 1 → { Europe }

Just one entry. Even though two price ranges want region 1, the pipeline produces it once, keyed by regionId 1.

Results land in a "buffer" — a temporary mailbox: regionId → the region rows. Think of it as "here are freshly-computed region rows, waiting to be delivered."

Then a "drain" step delivers buffered rows to the right parents. This is the routing step. To deliver region 1 to the correct price range, the engine keeps a routing index:

routingIndex:  regionId 1  →  "which price-range group does this belong to?"

Each price-range group has its own private copy of region results (its own little child collection), so the final tree can show each one its own region array. The drain reads the buffer, looks up the parent in the routing index, copies the rows into that parent's private copy, and then deletes the buffer entry (it's been delivered — don't deliver twice).

Why it broke

The whole design quietly assumed one nested key belongs to one parent. Two things follow from that assumption, and both are wrong when two siblings share regionId 1:

Bug 1 — the routing index only remembered one parent.

routingIndex.set(regionId 1, priceRange1)   // T-Shirt's
routingIndex.set(regionId 1, priceRange3)   // Hoodie's — OVERWRITES

It's a plain key→value map, so the second write clobbers the first. Now regionId 1 points to only one group. Then drain delivers region 1 to that one group and deletes the buffer entry. The other group's lookup finds nothing → its region comes back []. That's the empty array you saw.

Bug 2 — late arrivals get nothing.

Say T-Shirt's priceRange exists at load, but Hoodie's priceRange 3 is inserted later. By then the region pipeline already emitted region 1 (at load), it was drained, and the buffer entry was deleted. When priceRange 3 shows up, the pipeline does not re-emit region 1 — from its point of view nothing about regions changed; it already produced that row. So the buffer is empty and there's nothing to deliver to Hoodie. Empty array again.

Both of these were confirmed with debug logging before writing the fix.

The fix — two matching pieces

Fix 1: let one nested key map to many parents

The routing index value changes from a single parent to a Set of parents:

routingIndex:  regionId 1  →  { priceRange1, priceRange3 }
  • Adding a parent now adds to the set instead of overwriting (the updateRoutingIndex insert change).
  • Removing a parent removes just that one from the set, leaving the others (the delete change and cleanRoutingIndexOnDelete). This is what keeps the deletion case working — deleting Hoodie mustn't wipe regionId 1 while T-Shirt still uses it.
  • Drain now loops over every parent in the set and copies the rows into each one's private copy, and only then deletes the buffer entry. That's the for (const parentCorrelationKey of parentCorrelationKeys) loop. This fixes the initial-load case.

Fix 2: remember delivered rows so late arrivals can be seeded

Because the pipeline won't re-emit region 1 for a price range added later, the engine needs its own memory of "what region rows currently exist for regionId 1". That's the new snapshot:

snapshot:  regionId 1  →  { Europe }   (kept around, not deleted on drain)
  • Every time something is drained, it's also folded into the snapshot (accumulateSnapshot). The snapshot tracks a running count so that if a region is later deleted, it drops out of the snapshot too — it stays an accurate picture of "what currently exists", not a pile of stale rows.
  • When a new parent group first references an existing key (isNewParent in updateRoutingIndex), we seed it straight from the snapshot (seedParentFromSnapshot) — hand it the rows its siblings already have, without waiting on the pipeline. This fixes the insert-after-load case.

One-line summary

The nested-includes system assumed each nested key had exactly one parent and treated delivered rows as throwaway. The fix makes a nested key able to feed multiple parents (Set instead of single value + fan-out on drain), and keeps a snapshot of currently-existing rows so parents that show up late can be handed the same data their siblings already received.


Is the snapshot a memory problem?

Partly a fair concern, but it's bounded, not unbounded:

  • It stores one entry per distinct (nestedKey, childKey) — i.e. one copy of each distinct nested row, not one per parent. Siblings sharing a key share the single snapshot entry. So its size tracks the distinct nested result set, not parent count.
  • It holds references to the same row value objects already materialized elsewhere, not deep copies. The extra cost is Map overhead + one slot per distinct row.
  • It shrinks: accumulateSnapshot tracks a net count and deletes rows (and empty keys) when the count hits 0 on delete. So it's not an ever-growing log — it stays proportional to currently-live data.

So it's not a leak and not multiplied by parents. But it is a real added cost: roughly O(distinct nested rows) extra — essentially one more index over the deepest level, for the lifetime of the live query. For a query with a very large nested collection, that's non-trivial memory.


The alternative: copy from an existing sibling instead of keeping a snapshot

The key realization

When does a late-arriving parent need seeding at all? Only when the region rows were already delivered and the buffer entry deleted in an earlier flush, and the pipeline won't re-emit them. But "already delivered to whom?" — to a sibling. So whenever seeding is needed, at least one sibling already has the rows materialized in its own private copy.

That's the insight the alternative exploits: you don't need a separate stash of the data, because a sibling is already holding a perfectly good copy of it. Go ask the sibling.

How "copy from sibling" works

Walk through Hoodie's priceRange 3 arriving late, referencing regionId 1:

  1. Look up the routing index (which is now a Set thanks to Fix 1):

    regionId 1 → { priceRange1, priceRange3 }
    

    priceRange3 is the newcomer; priceRange1 is an existing sibling.

  2. Find a sibling that already has the data. priceRange1's private region state has a childRegistry keyed by regionId, and childRegistry[regionId 1] is a little collection already holding { Europe }.

  3. Copy those rows into priceRange3's pending changes, then let the normal materialize step run. priceRange3 now shows [Europe].

So instead of consulting a snapshot, step 2 reaches into a sibling's already-materialized collection.

What it costs you in code

The snapshot stored exactly what the delivery machinery needs: for each row, its childKey, its value, and its orderByIndex. A live child collection only stores the rows themselves. To rebuild "changes" from it you have to look those extra bits back up:

  • the sibling state's resultKeys map (row → childKey),
  • its orderByIndices map (row → orderByIndex),

then iterate the sibling's collection and reconstruct insert-changes from each row. It's the same idea as seedParentFromSnapshot, just reading from a collection + two side-maps instead of from one tidy snapshot Map.

You also need a small guard: "pick a sibling that is actually populated right now." The routing-index Set gives you candidates, but you'd skip the newcomer itself and any sibling that happens to be empty, and copy from the first good one. (As argued above, a good one is essentially guaranteed to exist whenever seeding is needed — if none did, the buffer path would still be handling it.)


The two approaches compared

Snapshot (current fix) Copy-from-sibling
Standing memory one extra index over distinct nested rows, kept for the query's life none — reuses data already in siblings
Work at steady state a little on every drain (fold into snapshot) none
Work on a late insert cheap: read tidy snapshot a bit more: find a ready sibling, rebuild changes from its collection + side-maps
Code complexity self-contained, one clean structure more fiddly: sibling selection + reconstructing changes
Correctness risk must keep the snapshot's net-count in sync with deletes/updates sibling is always the live truth, so no parallel structure to drift — but "is a sibling ready?" is a subtler question

How to choose

The snapshot trades memory for simplicity and predictability: one clean structure, consulted in one place, no "go find a healthy donor" logic. Copy-from-sibling trades that memory away but pays in code complexity and a reconstruction step on each late insert.

For typical includes (modest nested result sets) the snapshot's overhead is negligible and the simpler code wins. If you expect very large nested collections where doubling an index matters, copy-from-sibling is the better fit — and it's a localized swap: only accumulateSnapshot + seedParentFromSnapshot change; the Set-based routing fan-out (Fix 1) stays exactly as-is either way.

…lize includes

The nested-includes routing fix is independent of how each level is
materialized. Add regression tests proving sibling parent groups that share a
deepest correlation key resolve their grandchildren when the nested levels are
left as live Collections (no wrapper) and when wrapped with materialize(),
mirroring the existing toArray coverage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/db/tests/query/includes.test.ts`:
- Around line 5171-5261: The current live/materialize regression tests only
verify preload behavior; extend the existing nested-collection coverage in the
relevant `it(...)` cases to assert post-load mutations as well. Use the same
`createLiveQueryCollection`, `toTree`, and nested `q.from(...)` setup to add
insert/delete assertions for the live and materialized variants, and include
async stale-order/race scenarios that exercise the same shared-correlation-key
path after initial load. Keep the new assertions anchored to the existing
collection setup so the bug is reproduced across both materialization modes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e827313e-ce5c-41ab-a147-be0bec3dfe49

📥 Commits

Reviewing files that changed from the base of the PR and between 5d5259e and 7bd39a8.

📒 Files selected for processing (1)
  • packages/db/tests/query/includes.test.ts

Comment on lines +5171 to +5261
// The shared-correlation-key routing is independent of how each level is
// materialized, so the same guarantee must hold when the nested levels are
// left as live Collections (no toArray/materialize wrapper).
it(`resolves nested grandchildren for sibling groups when levels stay Collections`, async () => {
type Product = { id: number; title: string }
type PriceRange = { id: number; productId: number; regionId: number }
type Region = { id: number; name: string }

const products = createCollection(
localOnlyCollectionOptions<Product>({
id: `shared-corr-collection-products`,
getKey: (p) => p.id,
initialData: [
{ id: 1, title: `T-Shirt` },
{ id: 2, title: `Hoodie` },
],
}),
)
const priceRanges = createCollection(
localOnlyCollectionOptions<PriceRange>({
id: `shared-corr-collection-price-ranges`,
getKey: (r) => r.id,
initialData: [
{ id: 1, productId: 1, regionId: 1 },
{ id: 2, productId: 1, regionId: 2 },
{ id: 3, productId: 2, regionId: 1 },
],
}),
)
const regions = createCollection(
localOnlyCollectionOptions<Region>({
id: `shared-corr-collection-regions`,
getKey: (r) => r.id,
initialData: [
{ id: 1, name: `Europe` },
{ id: 2, name: `North America` },
],
}),
)

await Promise.all([
products.preload(),
priceRanges.preload(),
regions.preload(),
])

const collection = createLiveQueryCollection({
id: `shared-corr-collection-live`,
query: (q) =>
q.from({ p: products }).select(({ p }) => ({
id: p.id,
title: p.title,
priceRanges: q
.from({ pr: priceRanges })
.where(({ pr }) => eq(pr.productId, p.id))
.select(({ pr }) => ({
id: pr.id,
regionId: pr.regionId,
region: q
.from({ r: regions })
.where(({ r }) => eq(r.id, pr.regionId))
.select(({ r }) => ({ id: r.id, name: r.name })),
})),
})),
})
await collection.preload()

// toTree recursively unwraps the nested live Collections into arrays.
expect(toTree(collection)).toEqual([
{
id: 1,
title: `T-Shirt`,
priceRanges: [
{ id: 1, regionId: 1, region: [{ id: 1, name: `Europe` }] },
{
id: 2,
regionId: 2,
region: [{ id: 2, name: `North America` }],
},
],
},
{
id: 2,
title: `Hoodie`,
priceRanges: [
{ id: 3, regionId: 1, region: [{ id: 1, name: `Europe` }] },
],
},
])
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Expand live/materialize variants with mutation-path assertions.

These two tests validate preload only. Please add post-load insert/delete coverage (and async stale-order/race coverage) for these same materialization modes to protect the full regression surface.

As per coding guidelines, **/*.test.{ts,tsx,js}: “Always add unit tests that reproduce a bug before fixing it…” and “Test corner cases including … async race conditions …”.

Also applies to: 5262-5353

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/db/tests/query/includes.test.ts` around lines 5171 - 5261, The
current live/materialize regression tests only verify preload behavior; extend
the existing nested-collection coverage in the relevant `it(...)` cases to
assert post-load mutations as well. Use the same `createLiveQueryCollection`,
`toTree`, and nested `q.from(...)` setup to add insert/delete assertions for the
live and materialized variants, and include async stale-order/race scenarios
that exercise the same shared-correlation-key path after initial load. Keep the
new assertions anchored to the existing collection setup so the bug is
reproduced across both materialization modes.

Source: Coding guidelines

@kevin-dp

Copy link
Copy Markdown
Contributor Author

Had a quick discussion with @samwillis on the 2 alternatives. We decided to go for correctness and simplicity so we pick the snapshot approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3-level nested toArray: shared buffer in createPerEntryIncludesStates drops children when correlation keys overlap across parent groups

1 participant