Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 53 additions & 67 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,70 +6,79 @@ For past changes and release history, see [CHANGELOG.md](CHANGELOG.md).

---

## Current Status
## Current Status (v2.9.0)

diff-diff v2.8.4 is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis, plus **unique survey support** — all estimators accept survey weights, with design-based variance estimation varying by estimator. No R or Python package offers this combination:
diff-diff is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis, plus **unique survey support** that no R or Python package matches.

- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Borusyak-Jaravel-Spiess Imputation, Synthetic DiD, Triple Difference (DDD), Staggered Triple Difference (Ortiz-Villavicencio & Sant'Anna 2025), TROP, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing et al. 2024), Continuous DiD (Callaway, Goodman-Bacon & Sant'Anna 2024)
- **Valid inference**: Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap, placebo-based variance
- **Assumption diagnostics**: Parallel trends tests, placebo tests, Goodman-Bacon decomposition
- **Sensitivity analysis**: Honest DiD (Rambachan-Roth), Pre-trends power analysis (Roth 2022)
- **Study design**: Power analysis tools
- **Data utilities**: Real-world datasets (Card-Krueger, Castle Doctrine, Divorce Laws, MPDTA), DGP functions for all supported designs
- **Survey support**: `SurveyDesign` with strata, PSU, FPC, weight types, DEFF diagnostics, subpopulation analysis. All 15 estimators accept survey weights; design-based variance estimation (TSL, replicate weights, survey-aware bootstrap) varies by estimator. Replicate weights (BRR/Fay/JK1/JKn/SDR) supported for 12 of 15; `BaconDecomposition` is diagnostic-only. See [choosing_estimator.rst](docs/choosing_estimator.rst#survey-design-support) for the full compatibility matrix.
- **Performance**: Optional Rust backend for accelerated computation; faster than R at scale (see [CHANGELOG.md](CHANGELOG.md) for benchmarks)
### Estimators

---
- **Core**: Basic DiD, TWFE, MultiPeriod event study
- **Heterogeneity-robust**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), Borusyak-Jaravel-Spiess Imputation (2024), Two-Stage DiD (Gardner 2022), Stacked DiD (Wing et al. 2024)
- **Specialized**: Synthetic DiD (Arkhangelsky et al. 2021), Triple Difference, Staggered Triple Difference (Ortiz-Villavicencio & Sant'Anna 2025), Continuous DiD (Callaway, Goodman-Bacon & Sant'Anna 2024), TROP
- **Efficient**: EfficientDiD (Chen, Sant'Anna & Xie 2025) — semiparametrically efficient with doubly robust covariates
- **Nonlinear**: WooldridgeDiD / ETWFE (Wooldridge 2023, 2025) — saturated OLS (direct cohort x time coefficients), logit, and Poisson QMLE (ASF-based ATT with delta-method SEs)

### Inference & Diagnostics

## Near-Term Enhancements (v2.8)
- Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap, placebo-based variance
- Parallel trends tests, placebo tests, Goodman-Bacon decomposition
- Honest DiD sensitivity analysis (Rambachan & Roth 2023), pre-trends power analysis (Roth 2022)
- Power analysis and simulation-based MDE tools
- EPV diagnostics for propensity score estimation

### Survey Phase 7: Completing the Survey Story
### Survey Support

Close the remaining gaps for practitioners using major population surveys
(ACS, CPS, BRFSS, MEPS). See [survey-roadmap.md](docs/survey-roadmap.md) for
full details.
`SurveyDesign` with strata, PSU, FPC, weight types (pweight/fweight/aweight), lonely PSU handling. 15 of 16 estimators accept `survey_design` (WooldridgeDiD support planned for Phase 10f); design-based variance estimation varies by estimator:

- **CS Covariates + IPW/DR + Survey** *(Implemented)*: DRDID nuisance IF
corrections (PS + OR) under survey weights for all estimation methods.
- **Repeated Cross-Sections** *(Implemented)*: `panel=False` support for
CallawaySantAnna using cross-sectional DRDID (Sant'Anna & Zhao 2020,
Section 4). Supports BRFSS, ACS annual, CPS monthly.
- **Survey-Aware DiD Tutorial** *(Implemented)*: Jupyter notebook demonstrating
the full workflow with realistic survey data.
- **HonestDiD + Survey Variance** *(Implemented)*: Survey df and full
event-study VCV propagated to sensitivity analysis, with bootstrap/replicate
diagonal fallback.
- **TSL variance** (Taylor Series Linearization) with strata + PSU + FPC
- **Replicate weights**: BRR, Fay's BRR, JK1, JKn, SDR — 12 of 16 estimators (not SyntheticDiD, TROP, BaconDecomposition, or WooldridgeDiD)
- **Survey-aware bootstrap**: multiplier at PSU (IF-based) and Rao-Wu rescaled (resampling-based)
- **DEFF diagnostics**, **subpopulation analysis**, **weight trimming**, **CV on estimates**
- **Repeated cross-sections**: `CallawaySantAnna(panel=False)` for BRFSS, ACS, CPS
- **R cross-validation**: 15 tests against R's `survey` package using NHANES, RECS, and API datasets

### Staggered Triple Difference (DDD) *(Implemented)*
See [Survey Design Support](docs/choosing_estimator.rst#survey-design-support) for the full compatibility matrix, and [survey-roadmap.md](docs/survey-roadmap.md) for implementation details.

`StaggeredTripleDifference` estimator for staggered adoption DDD settings.
**Gap**: WooldridgeDiD does not yet accept `survey_design`. Planned for Phase 10f.

- Group-time ATT(g,t) for DDD designs with variation in treatment timing
- Event study aggregation and pre-treatment placebo effects
- Multiplier bootstrap for valid inference in staggered settings
- Full survey support (pweight, strata/PSU/FPC, replicate weights)
### Infrastructure

**Reference**: [Ortiz-Villavicencio & Sant'Anna (2025)](https://arxiv.org/abs/2505.09942). "Better Understanding Triple Differences Estimators." *Working Paper*. R package: `triplediff`.
- Optional Rust backend for accelerated computation
- Label-gated CI (tests run only when `ready-for-ci` label is added)
- Documentation dependency map (`docs/doc-deps.yaml`) with `/docs-impact` skill
- AI practitioner guardrails based on Baker et al. (2025) 8-step workflow

---

## Medium-Term Enhancements
## Active Work: Survey Academic Credibility (Phase 10)

### Efficient DiD Estimators
Before broadly announcing survey capability, we are establishing the theoretical
and empirical foundation needed for credibility with practitioners and
methodologists. See [survey-roadmap.md](docs/survey-roadmap.md) for detailed specs.

Semiparametrically efficient versions of existing DiD/event-study estimators with 40%+ precision gains over current methods.
| Item | Priority | Status |
|------|----------|--------|
| **10a.** Theory document (`survey-theory.md`) | HIGH | Not started |
| **10b.** Research-grade survey DGP (enhance `generate_survey_did_data`) | HIGH | Not started |
| **10c.** Expand R validation (ImputationDiD, StackedDiD, SunAbraham, TripleDifference) | HIGH | Not started |
| **10d.** Tutorial: flat-weight vs design-based comparison | HIGH | Not started — depends on 10b |
| **10e.** Position paper / arXiv preprint | MEDIUM | Not started — depends on 10b |
| **10f.** WooldridgeDiD survey support (OLS + logit + Poisson) | MEDIUM | Not started |
| **10g.** Practitioner guidance: when does survey design matter? | LOW | Not started |

---

**Reference**: [Chen, Sant'Anna & Xie (2025)](https://arxiv.org/abs/2506.17729). *Working Paper*.
## Future Estimators

### de Chaisemartin-D'Haultfœuille Estimator
### de Chaisemartin-D'Haultfouille Estimator

Handles treatment that switches on and off (reversible treatments), unlike most other methods.

- Allows units to move into and out of treatment
- Time-varying, heterogeneous treatment effects
- Comparison with never-switchers or flexible control groups

**Reference**: [de Chaisemartin & D'Haultfœuille (2020, 2024)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3980758). *American Economic Review*.
**Reference**: [de Chaisemartin & D'Haultfouille (2020, 2024)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3980758). *American Economic Review*.

### Local Projections DiD

Expand All @@ -79,13 +88,7 @@ Implements local projections for dynamic treatment effects. Doesn't require spec
- Robust to misspecification of dynamics
- Natural handling of anticipation effects

**Reference**: Dube, Girardi, Jordà, and Taylor (2023).

### Nonlinear DiD

Implemented in `WooldridgeDiD` (alias `ETWFE`) — OLS, Poisson QMLE, and logit paths with ASF-based ATT. See [Tutorial 16](docs/tutorials/16_wooldridge_etwfe.ipynb).

**Reference**: [Wooldridge (2023)](https://academic.oup.com/ectj/article/26/3/C31/7250479). *The Econometrics Journal*.
**Reference**: Dube, Girardi, Jorda, and Taylor (2023).

### Causal Duration Analysis with DiD

Expand All @@ -98,7 +101,7 @@ Extends DiD to duration/survival outcomes where standard methods fail (hazard ra

---

## Long-Term Research Directions (v3.0+)
## Long-Term Research Directions

Frontier methods requiring more research investment.

Expand All @@ -110,12 +113,11 @@ Standard DiD assumes SUTVA; spatial/network spillovers violate this. Two-stage i

### Quantile/Distributional DiD

Recover the full counterfactual distribution and quantile treatment effects (QTT), not just mean ATT. Goes beyond "what's the average effect" to "who gains, who loses."
Recover the full counterfactual distribution and quantile treatment effects (QTT), not just mean ATT.

- Changes-in-Changes (CiC) identification strategy
- QTT(τ) at user-specified quantiles
- QTT(tau) at user-specified quantiles
- Full counterfactual distribution function
- Two-period foundation, then staggered extension

**Reference**: [Athey & Imbens (2006)](https://onlinelibrary.wiley.com/doi/10.1111/j.1468-0262.2006.00668.x). *Econometrica*.

Expand All @@ -129,10 +131,6 @@ ML-powered conditional ATT — discover who benefits most from treatment using d

Machine learning methods for discovering heterogeneous treatment effects in DiD settings.

- Estimate treatment effect heterogeneity across covariates
- Data-driven subgroup discovery
- Honest confidence intervals for discovered heterogeneity

**References**:
- [Kattenberg, Scheer & Thiel (2023)](https://ideas.repec.org/p/cpb/discus/452.html). *CPB Discussion Paper*.
- Athey & Wager (2019). *Annals of Statistics*.
Expand All @@ -141,18 +139,12 @@ Machine learning methods for discovering heterogeneous treatment effects in DiD

Unified framework encompassing synthetic control and regression approaches.

- Nuclear norm regularization for low-rank structure
- Bridges synthetic control (few units, many periods) and regression (many units, few periods)

**Reference**: [Athey et al. (2021)](https://arxiv.org/abs/1710.10251). *Journal of the American Statistical Association*.

### Double/Debiased ML for DiD

For high-dimensional settings with many potential confounders.

- ML for nuisance parameter estimation (propensity, outcome models)
- Cross-fitting for valid inference

**Reference**: Chernozhukov et al. (2018). *The Econometrics Journal*.

### Alternative Inference Methods
Expand All @@ -163,15 +155,9 @@ For high-dimensional settings with many potential confounders.

---

## Infrastructure Improvements

- Video tutorials and worked examples

---

## Contributing

Interested in contributing? Features in the "Near-Term" and "Medium-Term" sections are good candidates. See the [GitHub repository](https://github.com/igerber/diff-diff) for open issues.
Interested in contributing? The Phase 10 items and future estimators are good candidates. See the [GitHub repository](https://github.com/igerber/diff-diff) for open issues.

Key references for implementation:
- [Roth et al. (2023)](https://www.sciencedirect.com/science/article/abs/pii/S0304407623001318). "What's Trending in Difference-in-Differences?" *Journal of Econometrics*.
Expand Down
10 changes: 0 additions & 10 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,19 +58,11 @@ Deferred items from PR reviews that were not addressed before merge.
|-------|----------|----|----------|
| CallawaySantAnna: consider materializing NaN entries for non-estimable (g,t) cells in group_time_effects dict (currently omitted with consolidated warning); would require updating downstream consumers (event study, balance_e, aggregation) | `staggered.py` | #256 | Low |
| ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) |
| ImputationDiD survey pretrends: subpopulation approach implemented (full design with zero-padded scores). Resolved in #260. | `imputation.py` | #260 | Resolved |
| Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
| Replicate-weight survey df — **Resolved**. `df_survey = rank(replicate_weights) - 1` matching R's `survey::degf()`. For IF paths, `n_valid - 1` when dropped replicates reduce effective count. | `survey.py` | #238 | Resolved |
| CallawaySantAnna survey: strata/PSU/FPC — **Resolved**. Aggregated SEs (overall, event study, group) use `compute_survey_if_variance()`. Bootstrap uses PSU-level multiplier weights. | `staggered.py` | #237 | Resolved |
| CallawaySantAnna survey + covariates + IPW/DR — **Resolved**. DRDID panel nuisance IF corrections (PS + OR) implemented for both survey and non-survey DR paths (Phase 7a). IPW path unblocked. | `staggered.py` | #233 | Resolved |
| SyntheticDiD/TROP survey: strata/PSU/FPC — **Resolved**. Rao-Wu rescaled bootstrap implemented for both. TROP uses cross-classified pseudo-strata. Rust TROP remains pweight-only (Python fallback for full design). | `synthetic_did.py`, `trop.py` | — | Resolved |
| EfficientDiD hausman_pretest() clustered covariance stale `n_cl` — **Resolved**. Recompute `n_cl` and remap indices after `row_finite` filtering via `np.unique(return_inverse=True)`. | `efficient_did.py` | #230 | Resolved |
| EfficientDiD `control_group="last_cohort"` trims at `last_g - anticipation` but REGISTRY says `t >= last_g`. With `anticipation=0` (default) these are identical. With `anticipation>0`, code is arguably more conservative (excludes anticipation-contaminated periods). Either align REGISTRY with code or change code to `t < last_g` — needs design decision. | `efficient_did.py` | #230 | Low |
| TripleDifference power: `generate_ddd_data` is a fixed 2×2×2 cross-sectional DGP — no multi-period or unbalanced-group support. Add a `generate_ddd_panel_data` for panel DDD power analysis. | `prep_dgp.py`, `power.py` | #208 | Low |
| ContinuousDiD event-study aggregation anticipation filter — **Resolved**. `_aggregate_event_study()` now filters `e < -anticipation` when `anticipation > 0`, matching CallawaySantAnna behavior. Bootstrap paths also filtered. | `continuous_did.py` | #226 | Resolved |
| Survey design resolution/collapse patterns are inconsistent across panel estimators — ContinuousDiD rebuilds unit-level design in SE code, EfficientDiD builds once in fit(), StackedDiD re-resolves on stacked data; extract shared helpers for panel-to-unit collapse, post-filter re-resolution, and metadata recomputation | `continuous_did.py`, `efficient_did.py`, `stacked_did.py` | #226 | Low |
| Survey-weighted Silverman bandwidth in EfficientDiD conditional Omega* — `_silverman_bandwidth()` uses unweighted mean/std for bandwidth selection; survey-weighted statistics would better reflect the population distribution but is a second-order refinement | `efficient_did_covariates.py` | — | Low |
| Survey metadata formatting dedup — **Resolved**. Extracted `_format_survey_block()` helper in `results.py`, replaced 13 occurrences across 11 files. | `results.py` + 10 results files | — | Resolved |
| TROP: `fit()` and `_fit_global()` share ~150 lines of near-identical data setup (panel pivoting, absorbing-state validation, first-treatment detection, effective rank, NaN warnings). Both bootstrap methods also duplicate the stratified resampling loop. Extract shared helpers to eliminate cross-file sync risk. | `trop.py`, `trop_global.py`, `trop_local.py` | — | Low |
| StaggeredTripleDifference R cross-validation: CSV fixtures not committed (gitignored); tests skip without local R + triplediff. Commit fixtures or generate deterministically. | `tests/test_methodology_staggered_triple_diff.py` | #245 | Medium |
| StaggeredTripleDifference R parity: benchmark only tests no-covariate path (xformla=~1). Add covariate-adjusted scenarios and aggregation SE parity assertions. | `benchmarks/R/benchmark_staggered_triplediff.R` | #245 | Medium |
Expand Down Expand Up @@ -184,8 +176,6 @@ Features in R's `did` package that block porting additional tests:

| Feature | R tests blocked | Priority | Status |
|---------|----------------|----------|--------|
| Repeated cross-sections (`panel=FALSE`) | ~7 tests in test-att_gt.R + test-user_bug_fixes.R | High | **Resolved** — Phase 7b: `panel=False` on CallawaySantAnna |
| Sampling/population weights | 7 tests incl. all JEL replication | Medium | **Resolved** (Phases 1-6 + 7a: CS IPW/DR + covariates + survey) |
| Calendar time aggregation | 1 test in test-att_gt.R | Low | |

---
Expand Down
Loading