Problem
The calibration diagnostics dashboard can consume the public microplex-us artifacts under artifacts/, but those files only expose summary-level information:
pe_us_data_rebuild_parity.json
live_pe_us_data_rebuild_checkpoint_modelpass_regression_summary_20260410.json
live_pe_us_data_rebuild_checkpoint_national_irs_other_drilldown_20260410.json
The current public JSONs report headline counts and losses, e.g. n_targets_kept, n_national_targets, n_state_targets, win rates, and broad native-loss metrics. They do not include the target-level rows needed to understand Microplex on its own: which aggregate values it produces, how those compare to the target oracle, and which target families drive error.
The us-data baseline is useful comparator context, but it should not be the only framing. Analysts need a Microplex-first target performance table.
Requested artifact
Please publish a run-level artifact that contains Microplex aggregate performance against the active PolicyEngine target oracle. Either of these would work:
- A full target diagnostics JSON/CSV, committed or otherwise publicly downloadable.
- The Microplex output H5, publicly downloadable, so downstream services can compute the same diagnostics.
Suggested target diagnostics schema
A row-oriented JSON or CSV would be easiest for dashboards/API consumers. Suggested fields:
run_id / artifact_id
candidate_dataset, e.g. microplex
target_id
variable
entity
geography / geo_level / state where applicable
period
target_value
microplex_aggregate
microplex_absolute_error
microplex_relative_error
loss_contribution or equivalent weighted term
family / target group
in_loss
supported_by_microplex
Optional comparator fields are also useful, but should be treated as secondary context:
baseline_dataset, e.g. enhanced_cps_2024.h5
us_data_aggregate
us_data_absolute_error
us_data_relative_error
delta_absolute_error
delta_relative_error
Why this matters
The calibration diagnostics dashboard now has a Microplex target-performance page and API endpoint, but it can only show rollups from the current public artifacts. Analysts want to answer questions like:
- How does Microplex perform against the target oracle by aggregate?
- Which aggregates drive Microplex native loss?
- Are failures concentrated in IRS SOI, state AGI distributions, ACA, SNAP, CTC, etc.?
- For a state/federal reform analysis, which upstream calibration nodes are trustworthy or suspect?
- Where is us-data useful comparator context, and where is Microplex simply unsupported or wrong against the target value?
Without target-level rows or an H5, the dashboard has to label the view as aggregate-only and cannot provide a full per-target/per-aggregate diagnostic view.
Downstream consumer
This is needed by PolicyEngine/calibration-diagnostics PR #9 and follow-up API work. Once this artifact exists, the dashboard can add an endpoint like /microplex/targets or /microplex/diff to expose the full Microplex aggregate performance table, with optional us-data comparator columns.
Problem
The calibration diagnostics dashboard can consume the public
microplex-usartifacts underartifacts/, but those files only expose summary-level information:pe_us_data_rebuild_parity.jsonlive_pe_us_data_rebuild_checkpoint_modelpass_regression_summary_20260410.jsonlive_pe_us_data_rebuild_checkpoint_national_irs_other_drilldown_20260410.jsonThe current public JSONs report headline counts and losses, e.g.
n_targets_kept,n_national_targets,n_state_targets, win rates, and broad native-loss metrics. They do not include the target-level rows needed to understand Microplex on its own: which aggregate values it produces, how those compare to the target oracle, and which target families drive error.The us-data baseline is useful comparator context, but it should not be the only framing. Analysts need a Microplex-first target performance table.
Requested artifact
Please publish a run-level artifact that contains Microplex aggregate performance against the active PolicyEngine target oracle. Either of these would work:
Suggested target diagnostics schema
A row-oriented JSON or CSV would be easiest for dashboards/API consumers. Suggested fields:
run_id/artifact_idcandidate_dataset, e.g.microplextarget_idvariableentitygeography/geo_level/statewhere applicableperiodtarget_valuemicroplex_aggregatemicroplex_absolute_errormicroplex_relative_errorloss_contributionor equivalent weighted termfamily/ target groupin_losssupported_by_microplexOptional comparator fields are also useful, but should be treated as secondary context:
baseline_dataset, e.g.enhanced_cps_2024.h5us_data_aggregateus_data_absolute_errorus_data_relative_errordelta_absolute_errordelta_relative_errorWhy this matters
The calibration diagnostics dashboard now has a Microplex target-performance page and API endpoint, but it can only show rollups from the current public artifacts. Analysts want to answer questions like:
Without target-level rows or an H5, the dashboard has to label the view as aggregate-only and cannot provide a full per-target/per-aggregate diagnostic view.
Downstream consumer
This is needed by
PolicyEngine/calibration-diagnosticsPR #9 and follow-up API work. Once this artifact exists, the dashboard can add an endpoint like/microplex/targetsor/microplex/diffto expose the full Microplex aggregate performance table, with optional us-data comparator columns.