π§ Context
We have no objective way to measure answer quality - right now improvements to scraping, retrieval, and prompts are judged by eyeballing. A small, hand-curated golden dataset of question β ideal answer β expected source URLs fixes that: later eval tickets (retrieval@k, answer-faithfulness) will score the bot against it. This ticket creates that dataset.
π Schema
evals/golden.yaml is a list of entries; each entry has exactly these three fields:
- question: "How do I get into a COMP course that's full?"
expected_answer: "A concise, factual answer grounded only in the source page(s) below."
expected_sources:
- "https://ccss.carleton.ca/resources/faqs/some-faq/"
- "https://..."
- question: "..."
expected_answer: "..."
expected_sources:
- "https://..."
question (string) β what a student would actually ask.
expected_answer (string) β the ideal grounded answer, concise and in the bot's style.
expected_sources (list of strings) β the URL(s) of the page(s) that actually contain the answer.
π How to build it
- Create
evals/golden.yaml.
- Start from the CCSS FAQ collection: https://ccss.carleton.ca/resources/#faqs-heading. Each useful FAQ becomes an entry β the FAQ's question β
question, its answer (reworded concisely) β expected_answer, the page URL and any sources it links β expected_sources.
Feel free to curate other entries outside of CCSS FAQs as well.
- Aim for ~15β20 entries to start (the set can grow later). Spread them across topics - registration, course information, co-op, etc.
- Deliberately include several questions whose answers live in FAQ accordion content such as FAQs at the end of this page: https://carleton.ca/scs/current-students/bachelor-of-cybersecurity/bcyber-courses-and-registration/. These are the cases that measure scraper quality.
- Prefer
expected_sources that are in-scope Carleton CS pages β ideally ones already in data/webpages/list.json or slated for it.
Notes
- This is a cross-component contract β keep the field names exactly as above; later eval scripts parse them. If you think a field needs adding or renaming, check with Jacc first (same rule as the shared domain types).
- Answers must be grounded in the cited page, not invented or pulled from general knowledge β the whole point is to test grounded retrieval.
expected_sources must be real, resolving URLs.
- No code or tests to run.
- The eval scripts that consume this file are separate, later tickets; this ticket just produces the data.
β
Acceptance Criteria
evals/golden.yaml exists and is valid YAML.
- ~15β20 entries, each with
question, expected_answer, and a non-empty expected_sources list, using exactly those field names.
- Several entries target FAQ/accordion content.
- Every
expected_sources URL resolves and is an in-scope Carleton CS page.
- Answers are concise and grounded in their cited sources (not general knowledge).
π§ Context
We have no objective way to measure answer quality - right now improvements to scraping, retrieval, and prompts are judged by eyeballing. A small, hand-curated golden dataset of
question β ideal answer β expected source URLsfixes that: later eval tickets (retrieval@k, answer-faithfulness) will score the bot against it. This ticket creates that dataset.π Schema
evals/golden.yamlis a list of entries; each entry has exactly these three fields:question(string) β what a student would actually ask.expected_answer(string) β the ideal grounded answer, concise and in the bot's style.expected_sources(list of strings) β the URL(s) of the page(s) that actually contain the answer.π How to build it
evals/golden.yaml.question, its answer (reworded concisely) βexpected_answer, the page URL and any sources it links βexpected_sources.Feel free to curate other entries outside of CCSS FAQs as well.
expected_sourcesthat are in-scope Carleton CS pages β ideally ones already indata/webpages/list.jsonor slated for it.Notes
expected_sourcesmust be real, resolving URLs.β Acceptance Criteria
evals/golden.yamlexists and is valid YAML.question,expected_answer, and a non-emptyexpected_sourceslist, using exactly those field names.expected_sourcesURL resolves and is an in-scope Carleton CS page.