The local PE-US rebuild smoke run progressed through CPS download/cache and PUF/demographics loading, then failed while loading the PUF provider because the local policyengine-us-data checkout does not contain policyengine_us_data/storage/soi.csv.
Command shape:
uv run --no-sync python -m microplex_us.pipelines.pe_us_data_rebuild_checkpoint \
--output-root artifacts/local_us_microplex_smoke \
--version-id local-smoke-v1 \
--baseline-dataset /Users/administrator/Documents/PolicyEngine/policyengine-us-data/policyengine_us_data/storage/enhanced_cps_2024.h5 \
--targets-db /Users/administrator/Documents/PolicyEngine/calibration-diagnostics/.artifacts/policy_data.db \
--policyengine-us-data-repo /Users/administrator/Documents/PolicyEngine/policyengine-us-data \
--calibration-backend microcalibrate \
--donor-imputer-backend zi_qrf \
--policyengine-materialize-batch-size 100000 \
--cps-sample-n 1000 --puf-sample-n 1000 --donor-sample-n 1000 \
--n-synthetic 1000 \
--defer-policyengine-harness \
--defer-policyengine-native-score \
--defer-native-audit \
--defer-imputation-ablation
Failure:
FileNotFoundError: Could not find PE SOI file at /Users/administrator/Documents/PolicyEngine/policyengine-us-data/policyengine_us_data/storage/soi.csv
Relevant traceback:
microplex_us/data_sources/puf.py:2253 load_frame
microplex_us/data_sources/puf.py:1970 _build_puf_tax_units
microplex_us/data_sources/puf.py:587 uprate_raw_puf_pe_style
microplex_us/data_sources/puf.py:486 _resolve_pe_soi_path
Observed local state:
policyengine_us_data/storage/calibration_targets/soi_targets.csv exists and appears to have the schema used by _get_pe_soi_aggregate (Year, Variable, Filing status, AGI lower bound, AGI upper bound, Count, Taxable only, Value).
policyengine_us_data/storage/download_prerequisites.py downloads PUF/demographics/np2023/geography prerequisites, but does not download or materialize soi.csv.
- The rebuild checkpoint CLI exposes
--puf-path and --puf-demographics-path, but not a --soi-path, even though PUFSourceProvider has a soi_path field/filter.
This makes a fresh local policyengine-us-data checkout insufficient for the documented --policyengine-us-data-repo path unless soi.csv is manually supplied at the legacy location.
Local workaround I will try next: copy storage/calibration_targets/soi_targets.csv to storage/soi.csv in the local policyengine-us-data checkout and rerun the smoke command.
The local PE-US rebuild smoke run progressed through CPS download/cache and PUF/demographics loading, then failed while loading the PUF provider because the local
policyengine-us-datacheckout does not containpolicyengine_us_data/storage/soi.csv.Command shape:
Failure:
Relevant traceback:
Observed local state:
policyengine_us_data/storage/calibration_targets/soi_targets.csvexists and appears to have the schema used by_get_pe_soi_aggregate(Year,Variable,Filing status,AGI lower bound,AGI upper bound,Count,Taxable only,Value).policyengine_us_data/storage/download_prerequisites.pydownloads PUF/demographics/np2023/geography prerequisites, but does not download or materializesoi.csv.--puf-pathand--puf-demographics-path, but not a--soi-path, even thoughPUFSourceProviderhas asoi_pathfield/filter.This makes a fresh local
policyengine-us-datacheckout insufficient for the documented--policyengine-us-data-repopath unlesssoi.csvis manually supplied at the legacy location.Local workaround I will try next: copy
storage/calibration_targets/soi_targets.csvtostorage/soi.csvin the localpolicyengine-us-datacheckout and rerun the smoke command.