-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Problem
The PR code changes workflow is taking 2.5+ hours to complete, far longer than the expected ~45 minutes. The bottleneck is in the Modal data build step (modal_app/data_build.py).
Root Cause
The Modal data build runs everything sequentially with no parallelization:
First pass (with TEST_LITE):
- Download prerequisites
uprating.pyacs.pycps.pyirs_puf.pypuf.pyextended_cps.pyenhanced_cps.py← Likely the slowest (calibration/reweighting)small_enhanced_cps.py
Second pass (LOCAL_AREA_CALIBRATION - runs full, not lite):
cps.pyagainpuf.pyagainextended_cps.pyagaincreate_stratified_cps.py
Then tests:
- Local area calibration tests
- Main test suite
Issues
- No parallelization - All scripts run sequentially despite having 8 CPUs available
- Duplicate builds - Several datasets are built twice (once for main, once for local area calibration)
- enhanced_cps.py involves expensive optimization/calibration operations
Observed
- Run #21254423149 took 2+ hours
- The job appeared to hang/stall at times during the Modal step
Potential Solutions
- Parallelize independent data build steps
- Cache intermediate results between the two build passes
- Profile individual scripts to identify specific bottlenecks
- Consider whether both build passes are necessary for every PR
Metadata
Metadata
Assignees
Labels
No labels