Remove dead ZIP_CODE_DATASET infrastructure#7695
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7695 +/- ##
============================================
+ Coverage 74.80% 100.00% +25.19%
============================================
Files 3907 5 -3902
Lines 56380 69 -56311
Branches 271 2 -269
============================================
- Hits 42177 69 -42108
+ Misses 14159 0 -14159
+ Partials 44 0 -44
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Is Claude blowing smoke? |
Consolidated Review Report: PR #7695 -- Remove Dead ZIP_CODE_DATASET InfrastructurePR: #7695 Executive SummaryThis PR removes the dead FindingsCRITICAL (Must Fix)None. All code changes are correct. The dead code removal leaves zero dangling references. The ACA guard logic uses proper sentinel patterns. CI passes. No hard-coded values, no crash risks introduced, no incorrect formulas. SHOULD ADDRESS (2 items)S1: Missing test for
|
| File | Change |
|---|---|
policyengine_us/data/zip_code_dataset.py |
Deleted |
policyengine_us/data/__init__.py |
Cleared (import removed) |
policyengine_us/variables/household/demographic/geographic/zip_code/zip_code.py |
Formula removed; input-only with default "UNKNOWN" |
policyengine_us/variables/household/demographic/geographic/zip_code/three_digit_zip_code.py |
Formula removed; input-only with default "" |
policyengine_us/variables/household/demographic/geographic/county/county.py |
Dead zip-code lookup branch removed |
policyengine_us/variables/gov/aca/slspc/slcsp_rating_area_la_county.py |
Guard against missing zip data added |
policyengine_us/variables/gov/aca/slspc/slcsp_age_0.py |
Guard against rating_area=0 added |
policyengine_us/tests/policy/baseline/gov/aca/slcsp/slcsp_rating_area_la_county.yaml |
New test: LA County with missing zip code |
changelog.d/fix-zip-code-dataset-crash.fixed.md |
Changelog fragment |
Overall Assessment
| Category | Status |
|---|---|
| Dead code removal complete | PASS |
| No dangling imports/references | PASS |
| ACA guard logic correct | PASS |
| No hard-coded values | PASS |
| Sentinel pattern valid | PASS |
| Changelog fragment correct | PASS |
| CI green | PASS (all 9 checks) |
| Test coverage -- happy paths | PASS |
| Test coverage -- new guard logic | PARTIAL (S1: slcsp_age_0 rating_area=0 untested) |
| Test coverage -- edge cases | PARTIAL (S2: unmapped zip prefix unverified) |
| Vectorization patterns | PASS |
| Entity-level correctness | PASS |
| No crash risks introduced | PASS |
Recommended Severity: COMMENT
- 0 critical issues -- nothing blocks merge
- 2 should-address items -- both are missing test coverage for new guard logic (not blocking, but valuable)
- 4 suggestions -- cosmetic/pre-existing cleanup opportunities
The PR is a clean, well-scoped infrastructure fix. The dead code removal is thorough, the guard logic is correct, and CI passes. The missing test for slcsp_age_0 with rating_area=0 is the most valuable follow-up action but does not block approval since the guard code itself is correct and the scenario requires unusual direct API input to trigger.
Generated with Claude Code
PavelMakarchuk
left a comment
There was a problem hiding this comment.
LGTM after test coverage is addressed and passes
PavelMakarchuk
left a comment
There was a problem hiding this comment.
Looks good — clean dead code removal, correct guard logic, CI green.
Suggestions:
- Add a test for
slcsp_age_0withrating_area=0to cover the new sentinel guard - Add a test for LA County with an unmapped zip prefix (e.g.,
"999") to verify behavior - Minor:
"Not an Los Angeles County"→"Not a Los Angeles County"in test names - Pre-existing: unused
Simulationimport inslcsp_age_0.py, redundantis_in_lafetch insidedefined_for = "in_la"
|
Thanks for the thorough review, Pavel! All suggestions addressed:
All tests passing (locally... we'll see) |
ZIP_CODE_DATASET loaded from zip_codes.csv.gz, a file that never existed in this repo. Every installation had ZIP_CODE_DATASET = None, making the zip_code formula dead code that crashed when reached via the LA County SLCSP rating area path on CA state-level H5 files. Removes zip_code_dataset.py, guts the zip_code and three_digit_zip_code formulas (keeping them as input-only variables), and removes the dead zip-code lookup branch from county.py. Fixes #7694, fixes #7664 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
slcsp_rating_area_la_county now returns rating area 0 when three_digit_zip_code is empty. slcsp_age_0 treats rating area 0 as unknown and returns cost 0, so households without zip data get no aca_ptc rather than crashing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add test for slcsp_age_0 with rating_area=0 (sentinel guard coverage)
- Add test for LA County with unmapped zip prefix and guard against crash
- Fix grammar in test name ("Not an" -> "Not a")
- Remove unused Simulation import and redundant is_in_la fetch
- Delete empty policyengine_us/data/ package
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5ca53e0 to
63a7861
Compare
The formula derives the 3-digit ZIP prefix from zip_code (e.g. 90019 → 900) and is independent of ZIP_CODE_DATASET. Its removal silently broke ACA PTC calculations for all LA County households that provide zip_code but not three_digit_zip_code, causing aca_ptc to return 0 instead of ~$5,055. Adds regression tests for the derivation and an integration test reproducing the Amplifi-reported ACA PTC bug. Fixes the regression introduced in 9d05912. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Restore three_digit_zip_code formula removed in #7695 The formula derives the 3-digit ZIP prefix from zip_code (e.g. 90019 → 900) and is independent of ZIP_CODE_DATASET. Its removal silently broke ACA PTC calculations for all LA County households that provide zip_code but not three_digit_zip_code, causing aca_ptc to return 0 instead of ~$5,055. Adds regression tests for the derivation and an integration test reproducing the Amplifi-reported ACA PTC bug. Fixes the regression introduced in 9d05912. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Simplify formula and fix import path - Use integer division (// 100) instead of float (// 1e2) in three_digit_zip_code - Import Simulation from policyengine_us public API, not policyengine_core internals - Add code deletion safety rule to CLAUDE.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
ZIP_CODE_DATASET, which loaded fromzip_codes.csv.gz— a file that never existed in this repo. Every installation hadZIP_CODE_DATASET = None.zip_codeformula was dead code that crashed when reached via the LA County SLCSP rating area path on CA state-level H5 files (AttributeError: 'NoneType' object has no attribute 'state').Changes:
zip_code_dataset.pyand its import fromdata/__init__.pyzip_codeandthree_digit_zip_code(kept as input-only variables)county.pyFixes #7694, fixes #7664
Test plan
three_digit_zip_codeas input)make formatclean🤖 Generated with Claude Code