leaderboard: per-dataset metric derivation, chip filters + metric toggle on every table by radinhamidi · Pull Request #29 · ls3-lab/QueryGym

radinhamidi · 2026-05-20T07:18:26Z

Summary

Comprehensive table-quality pass across every leaderboard page, driven by issues found in the latest review:

No more phantom metric columns. /datasets/[id] used to render whatever eval_metrics the dataset registry listed (MAP on TREC DL, recall_1000 on BEIR) regardless of whether those metrics actually appeared in the data. The per-dataset shard now derives its columns from the actual run rows, same approach as the home matrix uses.
Chip filters on every per-X page, not just the home page. /datasets/[id] gets Method/Model/Retriever/Metric; /methods/[id] gets Model/Retriever/Metric; /models/[id] gets Method/Retriever/Metric; /retrievers/[id] gets Method/Model/Metric. Behavior matches the home page (chip→qg-chip-hidden→qg-itable-reapply handshake).
Metric toggle everywhere. Per-method/model/retriever pages now expose both primary (nDCG@10) and secondary (R@1k or R@100) per dataset column, swapped via the Metric chip.
Pretty labels everywhere. Dataset short labels + METRIC_LABEL (ndcg_cut_10 → nDCG@10, recall_1000 → R@1k, recall_100 → R@100, map → MAP) on /datasets/[id] and the per-X pages too.
Drop the ugly inner scrollbar. Home + /datasets/[id] no longer set max-h-[70vh] overflow-y-auto. The page scrolls naturally; sticky top-0 thead sticks to the viewport.
/models index renders the display label (gpt-4.1) not the provider-prefixed id (openai/gpt-4.1) — matches the /methods index convention.
/runs/[run_id] reproduce snippet rebuilt against the real example pipeline. Pyserini index names no longer have the spurious .flat.splade-pp-ed / .flat.bge-base-en-v1.5 for non-lexical paradigms; trec_eval references the qrels key from the dataset registry, not the topics key.
/runs/[run_id] Method field shows the display name (Q2D (FS) etc.) not the raw method_id.
/about no longer claims every row ships a .run.txt and queries.tsv — those are optional under the current schema; path includes the {retriever} segment that PR Schema: optional artifacts + DL-HARD dataset entry #20 added.
Replaces duplicate cell + chip-bar code across 5 pages with two shared components: MatrixCell.astro (link + primary/secondary spans + sort hooks) and FilterChips.astro (groups + metric special-case + reapply event).

Test plan

python -m pytest reproducibility/tests/ — 44/44 passing
pnpm --filter @qg/leaderboard build — clean (1113 pages built)
/datasets/beir-v1.0.0-scifact: single metric column with nDCG@10 + R@100 toggle, no recall_1000 phantom column
/datasets/msmarco-v1-passage.trecdl2019: no MAP phantom column
/models/ index card titles show display labels (gpt-4.1, Qwen2.5-72B-Instruct…)
/runs/* Method field shows Q2D (FS) / Q2D (COT) for query2doc variants
/runs/* reproduce snippet generates beir-v1.0.0-trec-covid.splade-pp-ed, not .flat.splade-pp-ed
Home page produces no max-h-[70vh] wrapper

🤖 Generated with Claude Code

…ric toggle on every table - per-dataset shard reads metrics from actual runs (no MAP/recall_1000 phantom columns) - shared FilterChips + MatrixCell components reused across home / dataset / method / model / retriever pages - every per-X table gets chip filters (method/model/retriever/metric as applicable) + metric toggle - pretty metric labels (nDCG@10, R@1k, R@100, MAP) everywhere - drop double scrollbar on home + per-dataset tables - /models index renders display label, not provider-prefixed id - /runs page shows method display name; reproduce snippet aligned to example pipeline with correct Pyserini index names and qrels-based trec_eval - /about page no longer claims run.txt/queries.tsv are guaranteed; path includes retriever segment Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

leaderboard: per-dataset metric derivation, chip filters + metric toggle on every table#29

leaderboard: per-dataset metric derivation, chip filters + metric toggle on every table#29
radinhamidi wants to merge 1 commit into
mainfrom
leaderboard/critical-table-revision

radinhamidi commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

radinhamidi commented May 20, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant