Skip to content

CI workflow takes 2.5+ hours due to sequential data builds #477

@baogorek

Description

@baogorek

Problem

The PR code changes workflow is taking 2.5+ hours to complete, far longer than the expected ~45 minutes. The bottleneck is in the Modal data build step (modal_app/data_build.py).

Root Cause

The Modal data build runs everything sequentially with no parallelization:

First pass (with TEST_LITE):

  1. Download prerequisites
  2. uprating.py
  3. acs.py
  4. cps.py
  5. irs_puf.py
  6. puf.py
  7. extended_cps.py
  8. enhanced_cps.py ← Likely the slowest (calibration/reweighting)
  9. small_enhanced_cps.py

Second pass (LOCAL_AREA_CALIBRATION - runs full, not lite):

  1. cps.py again
  2. puf.py again
  3. extended_cps.py again
  4. create_stratified_cps.py

Then tests:

  1. Local area calibration tests
  2. Main test suite

Issues

  1. No parallelization - All scripts run sequentially despite having 8 CPUs available
  2. Duplicate builds - Several datasets are built twice (once for main, once for local area calibration)
  3. enhanced_cps.py involves expensive optimization/calibration operations

Observed

  • Run #21254423149 took 2+ hours
  • The job appeared to hang/stall at times during the Modal step

Potential Solutions

  • Parallelize independent data build steps
  • Cache intermediate results between the two build passes
  • Profile individual scripts to identify specific bottlenecks
  • Consider whether both build passes are necessary for every PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions