Skip to content

Calibration loss: scientific display + $0-target breakdown#20

Merged
PavelMakarchuk merged 4 commits into
mainfrom
fix-zero-target-rel-error
Jun 17, 2026
Merged

Calibration loss: scientific display + $0-target breakdown#20
PavelMakarchuk merged 4 commits into
mainfrom
fix-zero-target-rel-error

Conversation

@PavelMakarchuk

Copy link
Copy Markdown
Contributor

Lands the calibration-loss display fix on main (the populace dashboard's other work is already on main via #18; the net diff here is just this fix).

The "Calibration loss" KPI showed an 18-digit integer (752,472,440,359,538,600). That raw loss is the optimizer's weighted squared-dollar error, 99–100% driven by 42 $0-reference SOI targets where ((est−target)/(|target|+1))² is undefined. Genuine fit: median |rel error| ≈ 31%, 27% within 10%.

  • fmtSci scientific notation (7.52e17)
  • lossBreakdown(): n_zero_target, loss_excl_zero_target, zero_target_loss_share, median_abs_rel_error
  • KPI: scientific value + relative-reduction delta + honest tooltip
  • Convergence card: amber "Why so large?" note showing raw AND $0-excluded loss
  • compare view: scientific loss

Consumer-side mitigation; producer should drop $0 targets from the loss (follow-up).

🤖 Generated with Claude Code

PavelMakarchuk and others added 4 commits June 16, 2026 11:19
SOI cells that are genuinely $0 (the under-$1 AGI band, several EITC-by-AGI
cells) were rendering their dollar miss as a percentage — a $52B estimate
against a $0 target showed "5191840415890%", which also dominated the worst-fit
ranking and made the calibration look broken.

Relative error is undefined when the target is 0. enrichTargetRow now nulls it
out (overriding any value the producer published), exposes the absolute miss as
abs_error, and flags the row with target_is_zero. The target diagnostics table
shows "<miss> vs 0" for those rows instead of a bogus percent, and they sort to
the bottom of the relative-error ranking instead of the top.

Large-but-finite percentages against small non-zero targets are left as-is —
those are real calibration misses, not artifacts.

9 bun tests (added a zero-target regression case); tsc clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The ledger-backed target surface names targets with dot-separated paths
(irs_soi.ty2022.table_2_5.eitc_by_agi_children.no_qualifying_children.25k_to_30k.eitc_total)
rather than the slash form (nation/irs/…). The parser only understood slashes,
so every dotted name fell whole into `source`, the variable browser showed 5,229
ungrouped rows, and geography/breakdowns were empty.

parseTarget now dispatches by name shape; parseDottedTarget decomposes the
dotted scheme into the same canonical {source, variable, measure, geography,
breakdown} the slash names produce:

- source label per family (irs_soi → "IRS SOI", jct → "JCT", census_stc →
  "Census STC", …); period tokens (ty2022/fy2024/cy2024/oep2024/month2025_12)
  stripped wherever they appear.
- measure from the leaf word: …_amount/_total → amount, …_returns/_claims/
  _count → count, so a variable's amount and count rows group separately.
- geography (us → United States, two-letter / AL_total → state), AGI bands
  (under_1 → "AGI <$1", 25k_to_30k → "AGI $25k–$30k"), and children
  (no_qualifying_children → "0 children") classified and faceted.
- pure-measure leaves (revenue_loss, projected_amount, collections) take the
  descriptive path token instead ("salt deduction", "adjusted gross income"),
  skipping code-like tokens ("t40").

Result on the current release: 5,229 names → 83 clean variables across all 11
families (IRS SOI, JCT, CBO, Census STC, CMS ×3, HHS, SSA, USDA), 0 unparsed;
income-band and children facets work. Slash names are unchanged. 4 new tests.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Comparing the 457-target slash release to the 5,229-target dotted release showed
"0 in common" and empty improved/regressed tables, because the per-target diff
matches on the raw name and the two releases use different naming conventions.

buildComparison now also produces a variable_comparison: it groups each
release's targets by a scheme-independent canonical variable key
(irs_soi/IRS SOI → irs, "…_expenditure" trimmed, measure folded) and reports the
per-variable fit (n, within-10%, mean |err|) for the variables present in both.
Across the 0cdbb27→f32c2e5 boundary that recovers 11 shared variables — the 5
JCT tax expenditures and 6 IRS income lines — e.g. adjusted gross income 89%→60%
within 10%, taxable pension 100%→18%.

The compare view shows a "Variable-level fit (A → B)" table (always), and only
renders the per-target improved/regressed tables when there are common targets.

1 new test (15 total); tsc clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…wn (#19)

The published loss is the optimizer's raw weighted squared-dollar error
(~7.5e17), rendered as an 18-digit integer. It is 99-100% driven by 42
$0-reference SOI targets, where the relative-error term
((est-target)/(|target|+1))^2 is undefined (populace estimates billions
against a $0 cell), so it measures those outliers, not fit quality.

- format.ts: fmtSci scientific-notation helper
- latest-artifact.ts: lossBreakdown() derives n_zero_target,
  loss_excl_zero_target, zero_target_loss_share, median_abs_rel_error
  from the per-target rows
- overview: scientific-notation KPI + relative-reduction delta + honest
  tooltip + an amber 'Why so large?' note showing the raw AND the
  $0-excluded loss (median |rel error| ~31%)
- compare view: scientific-notation loss
- use-populace: types for the new fields

Consumer-side mitigation only; the producer should drop $0 targets from
the loss (or use absolute error for them).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@PavelMakarchuk PavelMakarchuk merged commit 8b87de9 into main Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant