Skip to content

fix: add user_ids kwarg to LoCoMo / MemBench / MemSim / PersonaMem load_documents#14

Open
King-Brownie wants to merge 1 commit into
vectorize-io:mainfrom
King-Brownie:fix/locomo-load-documents-user-ids-signature
Open

fix: add user_ids kwarg to LoCoMo / MemBench / MemSim / PersonaMem load_documents#14
King-Brownie wants to merge 1 commit into
vectorize-io:mainfrom
King-Brownie:fix/locomo-load-documents-user-ids-signature

Conversation

@King-Brownie
Copy link
Copy Markdown

@King-Brownie King-Brownie commented May 12, 2026

Summary

Fixes the upstream TypeError: LoComoDataset.load_documents() got an unexpected keyword argument 'user_ids' raised by the standard omb run --dataset locomo --memory bm25 --split locomo10 --query-limit N command at commit 45fa380.

Companion to the issue filed alongside this PR.

Root cause

Dataset.load_documents() (base.py#L81-L92) declares user_ids: set[str] | None = None in its abstract signature. The runner at runner.py#L128 passes user_ids= to dataset.load_documents() whenever the dataset has an isolation_unit AND a query_limit is set.

4 of 7 concrete dataset overrides forgot to include the kwarg:

Dataset user_ids in override before this PR? isolation_unit Bug fires today?
LoComoDataset 'conversation' YES
MemBenchDataset None latent
MemSimDataset None latent
PersonaMemDataset None latent
LongMemEvalDataset 'question'
LifeBenchDataset 'user'
BEAMDataset 'conversation'

Changes

  • LoCoMo: add user_ids to signature and apply the filter if user_ids is not None and sample_id not in user_ids: continue. Matches the existing pattern in BEAMDataset / LongMemEvalDataset / LifeBenchDataset (which filter by their per-doc identifier).
  • MemBench / MemSim / PersonaMem: signature-only fixes. Their isolation_unit=None means the runner never passes them user_ids today, so adding filter logic without a trigger path would be premature. Signature parity with the base class is restored.

Total diff: +6 lines across 4 files.

Test plan

  • Live-validated: omb run --dataset locomo --memory bm25 --split locomo10 --query-limit 3 no longer raises TypeError and progresses to answer-generation.
  • Existing dataset test suite (please run on your CI).

Happy to address any review feedback.

…uments

The base class Dataset.load_documents() declares
``user_ids: set[str] | None = None`` as part of its abstract
contract. The runner at ``src/memory_bench/runner.py:128``
always passes ``user_ids=query_user_ids`` when
``dataset.isolation_unit is not None AND query_limit is not None``,
but 4 of 7 concrete dataset overrides forgot to include the
parameter in their signature:

- LoComoDataset (isolation_unit='conversation') — actively fires
  ``TypeError: LoComoDataset.load_documents() got an unexpected
  keyword argument 'user_ids'`` on
  ``omb run --dataset locomo --memory bm25 --split locomo10
   --query-limit N``.
- MemBenchDataset, MemSimDataset, PersonaMemDataset
  (isolation_unit=None) — latent signature-contract violations
  that would surface if upstream ever sets their isolation_unit.

This change:

- LoComoDataset: add ``user_ids`` to the signature AND apply the
  filter ``if user_ids is not None and sample_id not in user_ids:
  continue`` — matches the reference pattern in BEAMDataset /
  LongMemEvalDataset / LifeBenchDataset (which all filter by
  their own per-doc id and were already correct).
- MemBench / MemSim / PersonaMem: signature-only fixes; the
  runner never passes ``user_ids`` to them today (their
  isolation_unit is ``None``), so adding filter logic without a
  trigger path would be premature. The signature parity matches
  the base-class abstract method.

Live-validated downstream by re-running
``omb run --dataset locomo --memory bm25 --split locomo10
 --query-limit 3`` — the TypeError is gone; the run progresses to
answer generation.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 12, 2026

@ParisT97 is attempting to deploy a commit to the Vectorize Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants