+ QueryGym is split into four surfaces. They share the same data contract,
+ so a run from the dashboard or a third-party submitter lands in the same
+ leaderboard as the toolkit's own results.
+
+
diff --git a/web/site/src/components/FeatureGrid.astro b/web/site/src/components/FeatureGrid.astro
new file mode 100644
index 0000000..15fabd1
--- /dev/null
+++ b/web/site/src/components/FeatureGrid.astro
@@ -0,0 +1,60 @@
+---
+const features = [
+ {
+ title: "Single Prompt Bank",
+ body:
+ "One YAML registry of every prompt with version, license, and authorship metadata. Cite the exact text used in any run.",
+ icon: "📚",
+ },
+ {
+ title: "Pluggable searchers",
+ body:
+ "Drop-in adapters for Pyserini, PyTerrier, BEIR, MS MARCO, and any custom retriever. Bring your own index.",
+ icon: "🔌",
+ },
+ {
+ title: "Stable run schema",
+ body:
+ "Every run emits a versioned JSON conforming to a public JSON Schema — same shape across the toolkit, dashboard, and third-party submitters.",
+ icon: "🧬",
+ },
+ {
+ title: "OpenAI-compatible LLMs",
+ body:
+ "Works with any OpenAI-compatible endpoint. Gpt-4.1, Qwen, Mistral, vLLM, Ollama — switch with a config change.",
+ icon: "🧠",
+ },
+ {
+ title: "Reproducible by design",
+ body:
+ "Every leaderboard row links a JSON, a TREC run file, and the reformulated queries. Re-evaluate from a fresh clone.",
+ icon: "🔁",
+ },
+ {
+ title: "Citable artifacts",
+ body:
+ "Backed by two papers (WWW 2026 Demos, SIGIR 2026 Reproducibility) and a tagged reproducibility corpus on GitHub.",
+ icon: "📄",
+ },
+];
+---
+
+
+
What you get
+
+ QueryGym pairs a small, opinionated library with a contract-driven
+ reproducibility pipeline. The toolkit, the dashboard, and the leaderboard
+ all share the same data shape.
+
+ Reproducible query reformulation, powered by LLMs.
+
+
+ QueryGym is a toolkit for benchmarking and reproducing LLM-based query
+ rewriting methods across IR datasets. Open prompt bank, pluggable
+ searchers, frozen schemas, citable runs.
+
+ QueryGym is built by the LS3 Lab with collaborators
+ across the IR community. The aim is to make LLM-based query reformulation
+ research easy to reproduce: same prompt bank, same data shape,
+ same retrieval contracts, regardless of which LLM, dataset, or method
+ you're benchmarking.
+
+
+
Authors
+
+ Amin Bigdeli · Radin Hamidi Rad · Mert Incesu · Negar Arabzadeh ·
+ Charles L. A. Clarke · Ebrahim Bagheri
+
+
+
License
+
+ Apache 2.0. Free to use commercially. Schema and reproducibility
+ artifacts are in the public domain (CC0).
+
+
+
diff --git a/web/site/src/pages/cite.astro b/web/site/src/pages/cite.astro
new file mode 100644
index 0000000..feb8eaa
--- /dev/null
+++ b/web/site/src/pages/cite.astro
@@ -0,0 +1,50 @@
+---
+import Default from "../layouts/Default.astro";
+import CitationCard from "../components/CitationCard.astro";
+
+const wwwBibtex = `@misc{bigdeli2025querygymtoolkitreproduciblellmbased,
+ title={QueryGym: A Toolkit for Reproducible LLM-Based Query Reformulation},
+ author={Amin Bigdeli and Radin Hamidi Rad and Mert Incesu and Negar Arabzadeh and Charles L. A. Clarke and Ebrahim Bagheri},
+ year={2025},
+ eprint={2511.15996},
+ archivePrefix={arXiv},
+ primaryClass={cs.IR},
+ url={https://arxiv.org/abs/2511.15996},
+}`;
+
+const sigirBibtex = `@inproceedings{querygym2026reproducibility,
+ title={Reproducing LLM-Based Query Reformulation Across Backends with QueryGym},
+ author={Amin Bigdeli and Radin Hamidi Rad and Mert Incesu and Negar Arabzadeh and Charles L. A. Clarke and Ebrahim Bagheri},
+ booktitle={Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval},
+ series={SIGIR '26},
+ year={2026},
+ note={Reproducibility Track},
+}`;
+---
+
+
+
+
Cite QueryGym
+
+ Two papers back QueryGym. Use the toolkit citation if you're using the
+ library; add the reproducibility citation if you're comparing against
+ the SIGIR 2026 leaderboard numbers.
+
+
+
+
+
+
+
+
diff --git a/web/site/src/pages/index.astro b/web/site/src/pages/index.astro
new file mode 100644
index 0000000..68b2ce3
--- /dev/null
+++ b/web/site/src/pages/index.astro
@@ -0,0 +1,85 @@
+---
+import Default from "../layouts/Default.astro";
+import Hero from "../components/Hero.astro";
+import FeatureGrid from "../components/FeatureGrid.astro";
+import EcosystemMap from "../components/EcosystemMap.astro";
+import CodeBlock from "../components/CodeBlock.astro";
+import MethodGrid from "../components/MethodGrid.astro";
+---
+
+
+
+
+
+
+
+
+
+
Try it in 30 seconds
+
+ Reformulate a query against any OpenAI-compatible endpoint. Pyserini and
+ BEIR are optional extras.
+
+ Run reformulations without writing a single line of code.
+
+
+ The QueryGym Dashboard is the hosted product layer — login, run
+ methods through a UI, compare results live, manage API keys, and
+ publish to the leaderboard.
+
+
+
diff --git a/web/site/src/pages/install.astro b/web/site/src/pages/install.astro
new file mode 100644
index 0000000..5a7d5dc
--- /dev/null
+++ b/web/site/src/pages/install.astro
@@ -0,0 +1,77 @@
+---
+import Default from "../layouts/Default.astro";
+import CodeBlock from "../components/CodeBlock.astro";
+---
+
+
+
+
Install
+
+ QueryGym runs on Python 3.9+. The default install is dependency-light;
+ add extras for HuggingFace datasets, BEIR, Pyserini, or the
+ reproducibility tooling.
+
+
+
diff --git a/web/site/src/pages/methods.astro b/web/site/src/pages/methods.astro
new file mode 100644
index 0000000..1a99011
--- /dev/null
+++ b/web/site/src/pages/methods.astro
@@ -0,0 +1,26 @@
+---
+import Default from "../layouts/Default.astro";
+import MethodGrid from "../components/MethodGrid.astro";
+---
+
+
+
+
Methods
+
+ Each method ships with a registered prompt in the prompt bank, an
+ @register_method(...) entry, and a paper
+ reference. Switch between them by name in the API or CLI.
+
+
+
diff --git a/web/site/src/pages/reproducibility.astro b/web/site/src/pages/reproducibility.astro
new file mode 100644
index 0000000..8fb97ed
--- /dev/null
+++ b/web/site/src/pages/reproducibility.astro
@@ -0,0 +1,85 @@
+---
+import Default from "../layouts/Default.astro";
+import CodeBlock from "../components/CodeBlock.astro";
+---
+
+
+
+
Reproducibility, by design.
+
+ Every QueryGym run produces a single JSON conforming to a public,
+ versioned schema. Submissions to the leaderboard carry the JSON, a
+ TREC-format run file, and the reformulated queries — together they
+ reconstruct the experiment from a fresh clone.
+
+ Run the example pipeline, then use submit_run.py
+ and open a PR. CI validates the JSON against the schema; a maintainer
+ verifies the numbers locally before merge.
+
+
+ {`# 1. Run the example pipeline
+python examples/querygym_pyserini/pipeline.py \\
+ --dataset msmarco-v1-passage.trecdl2019 \\
+ --method query2e --model gpt-4.1-mini \\
+ --output-dir outputs/dl19_query2e
+
+# 2. Copy into the canonical layout
+python -m reproducibility.scripts.submit_run \\
+ --from-dir outputs/dl19_query2e
+
+# 3. Regenerate the aggregate
+make repro-aggregate
+
+# 4. Open a PR
+git add reproducibility/data/ && git commit && git push
+gh pr create`}
+
+
+
Papers
+
+ QueryGym is backed by two papers: the toolkit demo and a multi-LLM
+ reproduction study. Both link directly to the committed corpus.
+