Fix nonsense relative error on zero-target rows by PavelMakarchuk · Pull Request #18 · PolicyEngine/calibration-diagnostics

PavelMakarchuk · 2026-06-16T15:19:26Z

Problem

The Target diagnostics table rendered astronomical percentages like 5,191,840,415,890% for SOI cells whose target is genuinely $0 (the under-$1 AGI band, several EITC-by-AGI cells). Relative error against a $0 target is undefined; the code fell back to the raw dollar miss and displayed it as a percent. These bogus rows also dominated the worst-fit sort and made the calibration look broken.

Fix

relativeError() returns null when the target is 0.
enrichTargetRow nulls relative_error/abs_relative_error for zero-target rows (overriding any value the producer published), exposes the dollar miss as abs_error, and sets target_is_zero.
The targets table shows " vs 0" for those rows instead of a percent, and they sort to the bottom of the relative-error ranking.

Verified live against the current release: zero-target rows no longer show a percent; the remaining large percentages all have non-zero targets (e.g. a $535K target vs a ~$10B estimate) — real calibration misses, left untouched.

Tests

bun test — 9 pass (added a zero-target regression). tsc --noEmit clean.

🤖 Generated with Claude Code

SOI cells that are genuinely $0 (the under-$1 AGI band, several EITC-by-AGI cells) were rendering their dollar miss as a percentage — a $52B estimate against a $0 target showed "5191840415890%", which also dominated the worst-fit ranking and made the calibration look broken. Relative error is undefined when the target is 0. enrichTargetRow now nulls it out (overriding any value the producer published), exposes the absolute miss as abs_error, and flags the row with target_is_zero. The target diagnostics table shows "<miss> vs 0" for those rows instead of a bogus percent, and they sort to the bottom of the relative-error ranking instead of the top. Large-but-finite percentages against small non-zero targets are left as-is — those are real calibration misses, not artifacts. 9 bun tests (added a zero-target regression case); tsc clean. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The ledger-backed target surface names targets with dot-separated paths (irs_soi.ty2022.table_2_5.eitc_by_agi_children.no_qualifying_children.25k_to_30k.eitc_total) rather than the slash form (nation/irs/…). The parser only understood slashes, so every dotted name fell whole into `source`, the variable browser showed 5,229 ungrouped rows, and geography/breakdowns were empty. parseTarget now dispatches by name shape; parseDottedTarget decomposes the dotted scheme into the same canonical {source, variable, measure, geography, breakdown} the slash names produce: - source label per family (irs_soi → "IRS SOI", jct → "JCT", census_stc → "Census STC", …); period tokens (ty2022/fy2024/cy2024/oep2024/month2025_12) stripped wherever they appear. - measure from the leaf word: …_amount/_total → amount, …_returns/_claims/ _count → count, so a variable's amount and count rows group separately. - geography (us → United States, two-letter / AL_total → state), AGI bands (under_1 → "AGI <$1", 25k_to_30k → "AGI $25k–$30k"), and children (no_qualifying_children → "0 children") classified and faceted. - pure-measure leaves (revenue_loss, projected_amount, collections) take the descriptive path token instead ("salt deduction", "adjusted gross income"), skipping code-like tokens ("t40"). Result on the current release: 5,229 names → 83 clean variables across all 11 families (IRS SOI, JCT, CBO, Census STC, CMS ×3, HHS, SSA, USDA), 0 unparsed; income-band and children facets work. Slash names are unchanged. 4 new tests. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

PavelMakarchuk · 2026-06-16T15:33:35Z

Second commit on this branch also canonicalizes the dotted ledger target names so both naming schemes are supported.

This release names targets with dot paths (irs_soi.ty2022.table_2_5.eitc_by_agi_children.no_qualifying_children.25k_to_30k.eitc_total) instead of the slash form (nation/irs/…). The parser only understood slashes, so the variable browser showed 5,229 ungrouped rows with empty geography/breakdowns.

parseTarget now dispatches by name shape and parseDottedTarget decomposes the dotted scheme into the same {source, variable, measure, geography, breakdown}:

5,229 names → 83 clean variables across all 11 families (IRS SOI, JCT, CBO, Census STC, CMS ×3, HHS, SSA, USDA SNAP), 0 unparsed.
…_amount/…_total → amount, …_returns/…_claims → count (so a variable's amount and count rows group separately, e.g. IRS SOI / eitc · amount vs · count).
AGI bands humanized (under_1 → "AGI <$1", 25k_to_30k → "AGI $25k–$30k"), children (no_qualifying_children → "0 children"), geography populated — income-band and children facets work.
Pure-measure leaves take the descriptive token (jct.…salt_deduction.revenue_loss → "salt deduction", census_stc.…individual_income_tax_collections.al.t40.collections → "individual income tax collections", skipping the t40 code).

Slash names unchanged. 4 new tests (13 total).

Comparing the 457-target slash release to the 5,229-target dotted release showed "0 in common" and empty improved/regressed tables, because the per-target diff matches on the raw name and the two releases use different naming conventions. buildComparison now also produces a variable_comparison: it groups each release's targets by a scheme-independent canonical variable key (irs_soi/IRS SOI → irs, "…_expenditure" trimmed, measure folded) and reports the per-variable fit (n, within-10%, mean |err|) for the variables present in both. Across the 0cdbb27→f32c2e5 boundary that recovers 11 shared variables — the 5 JCT tax expenditures and 6 IRS income lines — e.g. adjusted gross income 89%→60% within 10%, taxable pension 100%→18%. The compare view shows a "Variable-level fit (A → B)" table (always), and only renders the per-target improved/regressed tables when there are common targets. 1 new test (15 total); tsc clean. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

PavelMakarchuk · 2026-06-16T15:45:55Z

Third commit: variable-level comparison across naming schemes.

Comparing the 457-target slash release (A) to the 5,229-target dotted release (B) showed "0 in common" — the per-target diff matches on raw names, which differ entirely between the two conventions. buildComparison now also emits a variable_comparison that groups each release by a scheme-independent canonical variable key and reports per-variable fit (n, within-10%, mean |err|) for variables in both releases.

Across the migration boundary this recovers 11 shared variables (the 5 JCT tax expenditures + 6 IRS income lines), e.g.:

Variable	A within 10%	B within 10%	Δ
adjusted gross income	89% (70)	60% (573)	−29%
taxable pension income	100% (2)	18% (124)	−82%
SALT deduction	100% (1)	100% (1)	0

The compare view shows a "Variable-level fit (A → B)" table always, and the per-target improved/regressed tables only when common targets exist.

PavelMakarchuk and others added 2 commits June 16, 2026 11:19

PavelMakarchuk merged commit 9af0f95 into main Jun 16, 2026

PavelMakarchuk mentioned this pull request Jun 17, 2026

Calibration loss: scientific display + $0-target breakdown #20

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix nonsense relative error on zero-target rows#18

Fix nonsense relative error on zero-target rows#18
PavelMakarchuk merged 3 commits into
mainfrom
fix-zero-target-rel-error

PavelMakarchuk commented Jun 16, 2026

Uh oh!

PavelMakarchuk commented Jun 16, 2026

Uh oh!

PavelMakarchuk commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PavelMakarchuk commented Jun 16, 2026

Problem

Fix

Tests

Uh oh!

PavelMakarchuk commented Jun 16, 2026

Uh oh!

PavelMakarchuk commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant