Document measured local build system requirements by MaxGhenis · Pull Request #60 · PolicyEngine/populace

MaxGhenis · 2026-06-15T18:07:21Z

What

Adds SYSTEM_REQUIREMENTS.md — the measured memory, disk, and CPU footprint of
developing and building populace locally — and links it from the README. Docs
only; no code changes.

Why

There was no answer to "what does a machine need to build populace?" beyond the
solver docstring's dense-matrix estimate. This makes build-machine sizing and
contributor onboarding concrete, and surfaces one avoidable scaling ceiling.

Measured findings

On Apple M5 Max / 128 GB / Python 3.14.4 / torch 2.12 (full certificate + repro
scripts in the doc). Peak RSS via ru_maxrss, one config per process.

RAM is the binding resource, with two distinct ceilings:
- Calibration compile peaks at ≈ 2 × n_targets × n_records × 8 bytes,
  because build_constraint_matrix densifies one row per target and vstacks
  before compressing to CSR. 300k × 6,288 targets → 29 GB; extrapolates to
  ~94 GB at 1M records and ~280 GB at 3M, so the generate-big rungs are
  cloud-only purely because of the compiler, not the solve.
- SCF wealth imputation (27 chained QRF targets) → ~15 GB, >30 min,
  single-threaded — the heaviest single build stage.
Full contract suite peaks at 11 GB (data loader 9 GB + policyengine-us
adapter 4 GB); pure-library suites are <0.5 GB.
Disk ≈ 18 GB (1 GB venv + 14 GB policyengine-us-data storage + 3.3 GB HF
cache).
Phases run sequentially, so the end-to-end build peak is the max of stage
peaks (~29 GB), not their sum.

Flagged follow-up

The calibration compile densification is the one avoidable ceiling: building the
CSR incrementally (accumulate data/indices/indptr per row, or
scipy.sparse.vstack per-row sparse rows) would cut compile RAM by ~n_records×
and move the 1M–3M rungs within reach of a large box. The existing
test_sparse_solve.py equivalence tests cover the behavior this must preserve.
Noted in the doc's Follow-up section; not addressed here.

Reproducibility

The doc carries an environment certificate and the two benchmark scripts
(calibration + imputation), so the numbers can be refreshed on any machine.

🤖 Generated with Claude Code

Add SYSTEM_REQUIREMENTS.md: measured memory/disk/CPU footprint of developing and building populace locally, plus a hardware-budget table. Link it from the README development section. Key measured findings (M5 Max, 128 GB, Python 3.14.4 / torch 2.12): - RAM is the binding constraint, with two distinct ceilings: - calibration compile peaks at ~2 x n_targets x n_records x 8 bytes (build_constraint_matrix densifies rows then vstacks before CSR): 300k x 6,288 targets -> 29 GB; extrapolates to ~94 GB at 1M records. - SCF wealth imputation (27 chained QRF targets) -> ~15 GB, >30 min, single-threaded. - Full contract suite peaks at 11 GB (data loader 9 GB + PE-US adapter 4 GB); pure-library suites are <0.5 GB. - Disk: ~18 GB (1 GB venv + 14 GB us-data storage + 3.3 GB HF cache). - Sequential phases -> end-to-end build peak is the max of stages (~29 GB). Includes an environment certificate and the reproduction scripts. Flags the calibration compile densification as an avoidable ceiling (build CSR incrementally) that currently caps local builds below the generate-big rungs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

MaxGhenis force-pushed the docs/system-requirements branch from e146904 to 83bee59 Compare June 15, 2026 18:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document measured local build system requirements#60

Document measured local build system requirements#60
MaxGhenis wants to merge 1 commit into
mainfrom
docs/system-requirements

MaxGhenis commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGhenis commented Jun 15, 2026

What

Why

Measured findings

Flagged follow-up

Reproducibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant