Cut wasted tokens (off-topic gate) and cold-start latency (/chat/backends warmup) by SakshiKekre · Pull Request #95 · PolicyEngine/policyengine-uk-chat

SakshiKekre · 2026-06-01T19:01:01Z

Two small, additive changes to the chat backend. Both are perf/UX fixes on the request critical path; both visible together in the same preview deploy.

1. Opt-in Haiku topic gate to short-circuit off-topic messages

Today the chat answers anything. "What's the capital of France?" today returns Paris-then-pivots-to-UK-policy — burning the full system prompt + reference doc on input tokens and output tokens for the apology. Every off-topic message does this.

Rate limiting (PR #48) caps request volume; iteration capping (PR #87) bounds runaway loops. Neither prevents off-topic acceptance.

This PR adds a pre-step that classifies the last user message with a single Haiku call (~$0.001) and short-circuits with a canned SSE refusal if it's clearly off-topic.

Off by default. Opt-in via:
```
POLICYENGINE_CHAT_TOPIC_GATE_ENABLED=true
POLICYENGINE_CHAT_TOPIC_GATE_MODEL=claude-haiku-4-5 # optional
```

Calibration (boundary cases the classifier prompt is tuned for):

Prompt	Classifier	Why
"Capital of France?"	reject	unambiguously off-topic
"What did the chancellor say yesterday?"	reject	news, not policy
"How will the PA reform affect inflation?"	let through	eval A4 — main loop should explain microsim vs macro
"What's the EITC?"	let through	factual policy lookup
ambiguous / malformed reply	let through	fail-open by design

False negatives (rejecting on-topic) are worse than false positives (accepting off-topic). The latter wastes a few cents; the former breaks the product.

2. Speed up /chat/backends from ~30-45s to <1s

Cold-container symptom: the backend-selector dropdown in the frontend was taking 30-45s to render after page load, while the chat input rendered fast. Root cause was `/chat/backends` paying for first-time imports of `policyengine_uk_compiled` + `policyengine_uk` (and `policyengine_us` on PR #54) inside `available_backends() → package_version()`.

Two small fixes:

`modal_app.py`: extend `_preload_engine` to also pre-import the Python backends (best-effort; failures non-fatal). Shifts the heavy OpenFisca import from request time to image build time.
`backend/model_backends.py`: memoise `available_backends()` output. The values don't change within a deploy, so `importlib.metadata.version()` only runs once per container.

Combined: `/chat/backends` returns in <100ms on a warm container and ~1s on cold, vs 30-45s today.

Why combined

Both small (~30 lines each), both touch the chat-message critical path, both visible together in the same preview deploy when testing. Topic gate is the bigger feature; the warmup is the perf fix you'd want for any demo where someone watches the page load.

Files

`backend/routes/chatbot.py` (+93): topic gate helpers + early-return wire-up
`backend/tests/test_topic_gate.py` (+86): classifier parser tests, fail-open, end-to-end gate stub
`backend/model_backends.py` (+26/-8): memoised `available_backends()`
`modal_app.py` (+18/-2): Python-backend pre-import

Stacked on PR #51

Base = `feat/model-backend-selector`. Once #51 merges, this auto-rebases to main.

Test plan

Confirm `/chat/backends` returns quickly on warm preview container
Open the preview, watch for the backend dropdown to render fast on first load (cold container)
Flip `POLICYENGINE_CHAT_TOPIC_GATE_ENABLED=true` on the preview's Modal secret; ask the four boundary cases and confirm behaviour matches the calibration table
Flip the env var back off; confirm baseline behaviour returns

Not in scope

Tuning the classifier prompt against a held-out eval set — manual examples for now
Telemetry for how often the gate fires (PR Add eval harness scaffold: spec, scenarios, fixtures dir #52's runner could surface `refused_by_topic_gate` from the `done` event — separate work)
Modal `min_containers=1` keep-warm — overkill for previews

Today every message — including "what's the capital of France?" — hits the full chat loop: system prompt, reference doc, tools, often several iterations before Claude decides the question isn't on-topic. Each one burns input + output tokens. This adds a pre-step that runs the last user message through a single small classification call (Haiku by default) and short-circuits with a canned SSE refusal if it's clearly off-topic. Wired in /chat/message after the billing check, before backend resolution. Calibration choices (in the classifier's system prompt): - Reject only when unambiguously not policy (capitals, sports, news, general advice). - Let everything ambiguous through. Eval A4 ("how does this reform affect inflation?") is a deliberate let-through — the main loop's scope refusal is the right place to handle that, not a pre-filter. - Any classification error fails open. Wasting a few cents is worse than wrongly rejecting an on-topic question. Gate is off by default. Opt-in via: POLICYENGINE_CHAT_TOPIC_GATE_ENABLED=true POLICYENGINE_CHAT_TOPIC_GATE_MODEL=claude-haiku-4-5 # optional override Tests cover parser behaviour, empty-input shortcut, error-path fail-open, and a TestClient-level check that the gate produces the expected SSE shape when on.

vercel · 2026-06-01T19:01:08Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
policyengine-uk-chat	Ready	Preview, Comment	Jun 2, 2026 2:20pm

github-actions · 2026-06-01T19:03:53Z

Beta preview is ready.

Frontend: open preview
Backend: open backend

Two small fixes that together remove a 30-45s cold-start wait on the backend-selector dropdown in the frontend. 1. modal_app.py: extend _preload_engine to also import policyengine_uk (and policyengine_us if installed). Best-effort — failures are non-fatal. Shifts the heavy OpenFisca import from request time to image build time. 2. model_backends.py: cache available_backends() output. The values don't change within a deploy, so importlib.metadata.version() — which can trigger the package import we're trying to avoid — only runs once per container. Combined effect: /chat/backends returns in <100ms on a warm container and ~1s on cold, vs 30-45s today.

The backend warmup landed the cold-start /chat/backends time from 30-45s down to ~12-15s (Modal container cold-start itself). The dropdown just rendered nothing during that window, which reads as "broken." Now it shows a small spinner + "Loading engines…" until the fetch resolves. Doesn't gate sending a message — UK compiled is the default anyway, so a user who sends before the dropdown settles still gets the right backend.

vercel Bot deployed to Preview June 1, 2026 19:01 View deployment

SakshiKekre changed the title ~~Add opt-in Haiku topic gate to /chat/message~~ Cut wasted tokens (off-topic gate) and cold-start latency (/chat/backends warmup) Jun 2, 2026

vercel Bot deployed to Preview June 2, 2026 13:42 View deployment

vercel Bot deployed to Preview June 2, 2026 14:20 View deployment

SakshiKekre mentioned this pull request Jun 2, 2026

Enable multizone embedding via prefixed asset URLs #96

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cut wasted tokens (off-topic gate) and cold-start latency (/chat/backends warmup)#95

Cut wasted tokens (off-topic gate) and cold-start latency (/chat/backends warmup)#95
SakshiKekre wants to merge 3 commits into
feat/model-backend-selectorfrom
feat/topic-gate

SakshiKekre commented Jun 1, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 1, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SakshiKekre commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Opt-in Haiku topic gate to short-circuit off-topic messages

2. Speed up /chat/backends from ~30-45s to <1s

Why combined

Files

Stacked on PR #51

Test plan

Not in scope

Uh oh!

vercel Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SakshiKekre commented Jun 1, 2026 •

edited

Loading

vercel Bot commented Jun 1, 2026 •

edited

Loading