Comparisons can silently use a stale or broken local eCPS baseline

## Problem

Tools that compare against "the eCPS" (microplex eCPS-replacement diagnostics, audits, etc.) resolve the baseline by pointing at a local `enhanced_cps_2024.h5` — whatever happens to be on disk. Nothing guarantees that file is the published production dataset, or that it is intact.

This has bitten us repeatedly. Most recently, a local rebuild's `enhanced_cps_2024.h5` was missing the `social_security_retirement` column entirely (dropped at the extended-CPS step), leaving the baseline ~64% short on total Social Security and 100% short on SS retirement. A comparison run against it scored a candidate dataset as "winning" on Social Security purely because the baseline was broken — a conclusion that does not hold against a healthy baseline. The published production eCPS for the same version is fine; only the local copy was broken.

## Proposed fix

A `production_baseline` resolver so anything comparing against the eCPS always uses the same verified, pinned production dataset:

- `resolve_production_ecps()` fetches the HF-published `enhanced_cps_2024.h5` pinned to the installed `policyengine-us-data` version (uploads tag the HF commit with the version) into the local Hugging Face cache, and returns the path plus provenance (repo, revision, sha256, integrity checks).
- `assert_baseline_intact()` fails loudly if a required column is missing or all-zero, turning the silent failure mode into a hard error.
- A CLI (`python -m policyengine_us_data.utils.production_baseline`) and `make production-ecps` fetch and verify on demand.

Downstream consumers can default to this resolved baseline; explicit overrides remain available but should run the same integrity gate and be recorded as non-production in result provenance.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparisons can silently use a stale or broken local eCPS baseline #1163

Problem

Proposed fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Comparisons can silently use a stale or broken local eCPS baseline #1163

Description

Problem

Proposed fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions