From ec223cd26064199e9645eda7b902d2b211ea2658 Mon Sep 17 00:00:00 2001 From: igerber Date: Sun, 5 Apr 2026 08:14:39 -0400 Subject: [PATCH 1/6] Update documentation for v2.9.0: add WooldridgeDiD, refresh roadmap and survey docs - ROADMAP.md: full rewrite reflecting v2.9.0 status, Phase 10 active work - survey-roadmap.md: add Phase 10 credibility items, simplify structure, fix stale refs - llms-full.txt: add WooldridgeDiD, StaggeredTripleDifference, survey support sections - llms.txt: add WooldridgeDiD, SDDD, survey section, missing tutorials - llms-practitioner.txt: add WooldridgeDiD to estimator decision tree - choosing_estimator.rst: add WooldridgeDiD/TripleDiff/SDDD to table and flowchart, remove stale EfficientDiD covariates+survey note - doc-deps.yaml: add WooldridgeDiD group and source mappings - TODO.md: remove 8 resolved items Co-Authored-By: Claude Opus 4.6 (1M context) --- ROADMAP.md | 120 ++++---- TODO.md | 10 - docs/choosing_estimator.rst | 18 +- docs/doc-deps.yaml | 26 +- docs/llms-full.txt | 170 +++++++++++ docs/llms-practitioner.txt | 6 +- docs/llms.txt | 18 +- docs/survey-roadmap.md | 545 +++++++++++++----------------------- 8 files changed, 482 insertions(+), 431 deletions(-) diff --git a/ROADMAP.md b/ROADMAP.md index 7dfa6fb5..68824d8d 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -6,62 +6,71 @@ For past changes and release history, see [CHANGELOG.md](CHANGELOG.md). --- -## Current Status +## Current Status (v2.9.0) -diff-diff v2.8.4 is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis, plus **unique survey support** — all estimators accept survey weights, with design-based variance estimation varying by estimator. No R or Python package offers this combination: +diff-diff is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis, plus **unique survey support** that no R or Python package matches. -- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Borusyak-Jaravel-Spiess Imputation, Synthetic DiD, Triple Difference (DDD), Staggered Triple Difference (Ortiz-Villavicencio & Sant'Anna 2025), TROP, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing et al. 2024), Continuous DiD (Callaway, Goodman-Bacon & Sant'Anna 2024) -- **Valid inference**: Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap, placebo-based variance -- **Assumption diagnostics**: Parallel trends tests, placebo tests, Goodman-Bacon decomposition -- **Sensitivity analysis**: Honest DiD (Rambachan-Roth), Pre-trends power analysis (Roth 2022) -- **Study design**: Power analysis tools -- **Data utilities**: Real-world datasets (Card-Krueger, Castle Doctrine, Divorce Laws, MPDTA), DGP functions for all supported designs -- **Survey support**: `SurveyDesign` with strata, PSU, FPC, weight types, DEFF diagnostics, subpopulation analysis. All 15 estimators accept survey weights; design-based variance estimation (TSL, replicate weights, survey-aware bootstrap) varies by estimator. Replicate weights (BRR/Fay/JK1/JKn/SDR) supported for 12 of 15; `BaconDecomposition` is diagnostic-only. See [choosing_estimator.rst](docs/choosing_estimator.rst#survey-design-support) for the full compatibility matrix. -- **Performance**: Optional Rust backend for accelerated computation; faster than R at scale (see [CHANGELOG.md](CHANGELOG.md) for benchmarks) +### Estimators ---- +- **Core**: Basic DiD, TWFE, MultiPeriod event study +- **Heterogeneity-robust**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), Borusyak-Jaravel-Spiess Imputation (2024), Two-Stage DiD (Gardner 2022), Stacked DiD (Wing et al. 2024) +- **Specialized**: Synthetic DiD (Arkhangelsky et al. 2021), Triple Difference, Staggered Triple Difference (Ortiz-Villavicencio & Sant'Anna 2025), Continuous DiD (Callaway, Goodman-Bacon & Sant'Anna 2024), TROP +- **Efficient**: EfficientDiD (Chen, Sant'Anna & Xie 2025) — semiparametrically efficient with doubly robust covariates +- **Nonlinear**: WooldridgeDiD / ETWFE (Wooldridge 2023, 2025) — OLS, logit, and Poisson QMLE with ASF-based ATT and delta-method SEs + +### Inference & Diagnostics -## Near-Term Enhancements (v2.8) +- Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap, placebo-based variance +- Parallel trends tests, placebo tests, Goodman-Bacon decomposition +- Honest DiD sensitivity analysis (Rambachan & Roth 2023), pre-trends power analysis (Roth 2022) +- Power analysis and simulation-based MDE tools +- EPV diagnostics for propensity score estimation -### Survey Phase 7: Completing the Survey Story +### Survey Support -Close the remaining gaps for practitioners using major population surveys -(ACS, CPS, BRFSS, MEPS). See [survey-roadmap.md](docs/survey-roadmap.md) for -full details. +`SurveyDesign` with strata, PSU, FPC, weight types (pweight/fweight/aweight), lonely PSU handling. All 15 estimators accept survey weights; design-based variance estimation varies by estimator: -- **CS Covariates + IPW/DR + Survey** *(Implemented)*: DRDID nuisance IF - corrections (PS + OR) under survey weights for all estimation methods. -- **Repeated Cross-Sections** *(Implemented)*: `panel=False` support for - CallawaySantAnna using cross-sectional DRDID (Sant'Anna & Zhao 2020, - Section 4). Supports BRFSS, ACS annual, CPS monthly. -- **Survey-Aware DiD Tutorial** *(Implemented)*: Jupyter notebook demonstrating - the full workflow with realistic survey data. -- **HonestDiD + Survey Variance** *(Implemented)*: Survey df and full - event-study VCV propagated to sensitivity analysis, with bootstrap/replicate - diagonal fallback. +- **TSL variance** (Taylor Series Linearization) with strata + PSU + FPC +- **Replicate weights**: BRR, Fay's BRR, JK1, JKn, SDR — 12 of 15 estimators +- **Survey-aware bootstrap**: multiplier at PSU (IF-based) and Rao-Wu rescaled (resampling-based) +- **DEFF diagnostics**, **subpopulation analysis**, **weight trimming**, **CV on estimates** +- **Repeated cross-sections**: `CallawaySantAnna(panel=False)` for BRFSS, ACS, CPS +- **R cross-validation**: 15 tests against R's `survey` package using NHANES, RECS, and API datasets -### Staggered Triple Difference (DDD) *(Implemented)* +See [Survey Design Support](docs/choosing_estimator.rst#survey-design-support) for the full compatibility matrix, and [survey-roadmap.md](docs/survey-roadmap.md) for implementation details. -`StaggeredTripleDifference` estimator for staggered adoption DDD settings. +**Gap**: WooldridgeDiD does not yet accept `survey_design`. Planned for Phase 10f. -- Group-time ATT(g,t) for DDD designs with variation in treatment timing -- Event study aggregation and pre-treatment placebo effects -- Multiplier bootstrap for valid inference in staggered settings -- Full survey support (pweight, strata/PSU/FPC, replicate weights) +### Infrastructure -**Reference**: [Ortiz-Villavicencio & Sant'Anna (2025)](https://arxiv.org/abs/2505.09942). "Better Understanding Triple Differences Estimators." *Working Paper*. R package: `triplediff`. +- Optional Rust backend for accelerated computation +- Label-gated CI (tests run only when `ready-for-ci` label is added) +- Documentation dependency map (`docs/doc-deps.yaml`) with `/docs-impact` skill +- AI practitioner guardrails based on Baker et al. (2025) 8-step workflow --- -## Medium-Term Enhancements +## Active Work: Survey Academic Credibility (Phase 10) -### Efficient DiD Estimators +Before broadly announcing survey capability, we are establishing the theoretical +and empirical foundation needed for credibility with practitioners and +methodologists. See [survey-roadmap.md](docs/survey-roadmap.md) for detailed specs. -Semiparametrically efficient versions of existing DiD/event-study estimators with 40%+ precision gains over current methods. +| Item | Priority | Status | +|------|----------|--------| +| **10a.** Theory document (`survey-theory.md`) | HIGH | Not started | +| **10b.** Research-grade survey DGP (enhance `generate_survey_did_data`) | HIGH | Not started | +| **10c.** Expand R validation (ImputationDiD, StackedDiD, SunAbraham, TripleDifference) | HIGH | Not started | +| **10d.** Tutorial: flat-weight vs design-based comparison | HIGH | Not started — depends on 10b | +| **10e.** Position paper / arXiv preprint | MEDIUM | Not started — depends on 10b | +| **10f.** WooldridgeDiD survey support (OLS + logit + Poisson) | MEDIUM | Not started | +| **10g.** Practitioner guidance: when does survey design matter? | LOW | Not started | + +--- -**Reference**: [Chen, Sant'Anna & Xie (2025)](https://arxiv.org/abs/2506.17729). *Working Paper*. +## Future Estimators -### de Chaisemartin-D'Haultfœuille Estimator +### de Chaisemartin-D'Haultfouille Estimator Handles treatment that switches on and off (reversible treatments), unlike most other methods. @@ -69,7 +78,7 @@ Handles treatment that switches on and off (reversible treatments), unlike most - Time-varying, heterogeneous treatment effects - Comparison with never-switchers or flexible control groups -**Reference**: [de Chaisemartin & D'Haultfœuille (2020, 2024)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3980758). *American Economic Review*. +**Reference**: [de Chaisemartin & D'Haultfouille (2020, 2024)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3980758). *American Economic Review*. ### Local Projections DiD @@ -79,13 +88,7 @@ Implements local projections for dynamic treatment effects. Doesn't require spec - Robust to misspecification of dynamics - Natural handling of anticipation effects -**Reference**: Dube, Girardi, Jordà, and Taylor (2023). - -### Nonlinear DiD - -Implemented in `WooldridgeDiD` (alias `ETWFE`) — OLS, Poisson QMLE, and logit paths with ASF-based ATT. See [Tutorial 16](docs/tutorials/16_wooldridge_etwfe.ipynb). - -**Reference**: [Wooldridge (2023)](https://academic.oup.com/ectj/article/26/3/C31/7250479). *The Econometrics Journal*. +**Reference**: Dube, Girardi, Jorda, and Taylor (2023). ### Causal Duration Analysis with DiD @@ -98,7 +101,7 @@ Extends DiD to duration/survival outcomes where standard methods fail (hazard ra --- -## Long-Term Research Directions (v3.0+) +## Long-Term Research Directions Frontier methods requiring more research investment. @@ -110,12 +113,11 @@ Standard DiD assumes SUTVA; spatial/network spillovers violate this. Two-stage i ### Quantile/Distributional DiD -Recover the full counterfactual distribution and quantile treatment effects (QTT), not just mean ATT. Goes beyond "what's the average effect" to "who gains, who loses." +Recover the full counterfactual distribution and quantile treatment effects (QTT), not just mean ATT. - Changes-in-Changes (CiC) identification strategy -- QTT(τ) at user-specified quantiles +- QTT(tau) at user-specified quantiles - Full counterfactual distribution function -- Two-period foundation, then staggered extension **Reference**: [Athey & Imbens (2006)](https://onlinelibrary.wiley.com/doi/10.1111/j.1468-0262.2006.00668.x). *Econometrica*. @@ -129,10 +131,6 @@ ML-powered conditional ATT — discover who benefits most from treatment using d Machine learning methods for discovering heterogeneous treatment effects in DiD settings. -- Estimate treatment effect heterogeneity across covariates -- Data-driven subgroup discovery -- Honest confidence intervals for discovered heterogeneity - **References**: - [Kattenberg, Scheer & Thiel (2023)](https://ideas.repec.org/p/cpb/discus/452.html). *CPB Discussion Paper*. - Athey & Wager (2019). *Annals of Statistics*. @@ -141,18 +139,12 @@ Machine learning methods for discovering heterogeneous treatment effects in DiD Unified framework encompassing synthetic control and regression approaches. -- Nuclear norm regularization for low-rank structure -- Bridges synthetic control (few units, many periods) and regression (many units, few periods) - **Reference**: [Athey et al. (2021)](https://arxiv.org/abs/1710.10251). *Journal of the American Statistical Association*. ### Double/Debiased ML for DiD For high-dimensional settings with many potential confounders. -- ML for nuisance parameter estimation (propensity, outcome models) -- Cross-fitting for valid inference - **Reference**: Chernozhukov et al. (2018). *The Econometrics Journal*. ### Alternative Inference Methods @@ -163,15 +155,9 @@ For high-dimensional settings with many potential confounders. --- -## Infrastructure Improvements - -- Video tutorials and worked examples - ---- - ## Contributing -Interested in contributing? Features in the "Near-Term" and "Medium-Term" sections are good candidates. See the [GitHub repository](https://github.com/igerber/diff-diff) for open issues. +Interested in contributing? The Phase 10 items and future estimators are good candidates. See the [GitHub repository](https://github.com/igerber/diff-diff) for open issues. Key references for implementation: - [Roth et al. (2023)](https://www.sciencedirect.com/science/article/abs/pii/S0304407623001318). "What's Trending in Difference-in-Differences?" *Journal of Econometrics*. diff --git a/TODO.md b/TODO.md index cc1d164f..5afc6274 100644 --- a/TODO.md +++ b/TODO.md @@ -58,19 +58,11 @@ Deferred items from PR reviews that were not addressed before merge. |-------|----------|----|----------| | CallawaySantAnna: consider materializing NaN entries for non-estimable (g,t) cells in group_time_effects dict (currently omitted with consolidated warning); would require updating downstream consumers (event study, balance_e, aggregation) | `staggered.py` | #256 | Low | | ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) | -| ImputationDiD survey pretrends: subpopulation approach implemented (full design with zero-padded scores). Resolved in #260. | `imputation.py` | #260 | Resolved | | Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium | -| Replicate-weight survey df — **Resolved**. `df_survey = rank(replicate_weights) - 1` matching R's `survey::degf()`. For IF paths, `n_valid - 1` when dropped replicates reduce effective count. | `survey.py` | #238 | Resolved | -| CallawaySantAnna survey: strata/PSU/FPC — **Resolved**. Aggregated SEs (overall, event study, group) use `compute_survey_if_variance()`. Bootstrap uses PSU-level multiplier weights. | `staggered.py` | #237 | Resolved | -| CallawaySantAnna survey + covariates + IPW/DR — **Resolved**. DRDID panel nuisance IF corrections (PS + OR) implemented for both survey and non-survey DR paths (Phase 7a). IPW path unblocked. | `staggered.py` | #233 | Resolved | -| SyntheticDiD/TROP survey: strata/PSU/FPC — **Resolved**. Rao-Wu rescaled bootstrap implemented for both. TROP uses cross-classified pseudo-strata. Rust TROP remains pweight-only (Python fallback for full design). | `synthetic_did.py`, `trop.py` | — | Resolved | -| EfficientDiD hausman_pretest() clustered covariance stale `n_cl` — **Resolved**. Recompute `n_cl` and remap indices after `row_finite` filtering via `np.unique(return_inverse=True)`. | `efficient_did.py` | #230 | Resolved | | EfficientDiD `control_group="last_cohort"` trims at `last_g - anticipation` but REGISTRY says `t >= last_g`. With `anticipation=0` (default) these are identical. With `anticipation>0`, code is arguably more conservative (excludes anticipation-contaminated periods). Either align REGISTRY with code or change code to `t < last_g` — needs design decision. | `efficient_did.py` | #230 | Low | | TripleDifference power: `generate_ddd_data` is a fixed 2×2×2 cross-sectional DGP — no multi-period or unbalanced-group support. Add a `generate_ddd_panel_data` for panel DDD power analysis. | `prep_dgp.py`, `power.py` | #208 | Low | -| ContinuousDiD event-study aggregation anticipation filter — **Resolved**. `_aggregate_event_study()` now filters `e < -anticipation` when `anticipation > 0`, matching CallawaySantAnna behavior. Bootstrap paths also filtered. | `continuous_did.py` | #226 | Resolved | | Survey design resolution/collapse patterns are inconsistent across panel estimators — ContinuousDiD rebuilds unit-level design in SE code, EfficientDiD builds once in fit(), StackedDiD re-resolves on stacked data; extract shared helpers for panel-to-unit collapse, post-filter re-resolution, and metadata recomputation | `continuous_did.py`, `efficient_did.py`, `stacked_did.py` | #226 | Low | | Survey-weighted Silverman bandwidth in EfficientDiD conditional Omega* — `_silverman_bandwidth()` uses unweighted mean/std for bandwidth selection; survey-weighted statistics would better reflect the population distribution but is a second-order refinement | `efficient_did_covariates.py` | — | Low | -| Survey metadata formatting dedup — **Resolved**. Extracted `_format_survey_block()` helper in `results.py`, replaced 13 occurrences across 11 files. | `results.py` + 10 results files | — | Resolved | | TROP: `fit()` and `_fit_global()` share ~150 lines of near-identical data setup (panel pivoting, absorbing-state validation, first-treatment detection, effective rank, NaN warnings). Both bootstrap methods also duplicate the stratified resampling loop. Extract shared helpers to eliminate cross-file sync risk. | `trop.py`, `trop_global.py`, `trop_local.py` | — | Low | | StaggeredTripleDifference R cross-validation: CSV fixtures not committed (gitignored); tests skip without local R + triplediff. Commit fixtures or generate deterministically. | `tests/test_methodology_staggered_triple_diff.py` | #245 | Medium | | StaggeredTripleDifference R parity: benchmark only tests no-covariate path (xformla=~1). Add covariate-adjusted scenarios and aggregation SE parity assertions. | `benchmarks/R/benchmark_staggered_triplediff.R` | #245 | Medium | @@ -184,8 +176,6 @@ Features in R's `did` package that block porting additional tests: | Feature | R tests blocked | Priority | Status | |---------|----------------|----------|--------| -| Repeated cross-sections (`panel=FALSE`) | ~7 tests in test-att_gt.R + test-user_bug_fixes.R | High | **Resolved** — Phase 7b: `panel=False` on CallawaySantAnna | -| Sampling/population weights | 7 tests incl. all JEL replication | Medium | **Resolved** (Phases 1-6 + 7a: CS IPW/DR + covariates + survey) | | Calendar time aggregation | 1 test in test-att_gt.R | Low | | --- diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst index 34bbf612..a74269cc 100644 --- a/docs/choosing_estimator.rst +++ b/docs/choosing_estimator.rst @@ -22,6 +22,7 @@ Start here and follow the questions: - **No** → Go to question 2 - **Yes** → Use :class:`~diff_diff.CallawaySantAnna` (or :class:`~diff_diff.EfficientDiD` for tighter SEs under PT-All) - **Yes, and you suspect homogeneous effects** → Use :class:`~diff_diff.ImputationDiD` or :class:`~diff_diff.TwoStageDiD` for tighter CIs + - **Yes, with nonlinear outcome (binary/count)** → Use :class:`~diff_diff.WooldridgeDiD` with ``method='logit'`` or ``method='poisson'`` - **Want to diagnose TWFE bias?** → Use :class:`~diff_diff.BaconDecomposition` first 2. **Do you have panel data?** (Multiple observations per unit over time) @@ -97,6 +98,18 @@ Quick Reference - Factor confounding suspected - Factor model + weights - ATT with triple robustness + * - ``TripleDifference`` + - Two eligibility criteria (DDD) + - Parallel trends for both dimensions + - DDD ATT (regression, IPW, or DR) + * - ``StaggeredTripleDifference`` + - Staggered DDD with treatment timing + - Conditional parallel trends (DDD) + - Group-time ATT(g,t), aggregations + * - ``WooldridgeDiD`` + - Nonlinear outcomes or saturated OLS + - Conditional parallel trends + - ASF-based ATT (OLS, logit, Poisson) * - ``BaconDecomposition`` - TWFE diagnostic - (diagnostic tool) @@ -678,11 +691,6 @@ estimation. The depth of support varies by estimator: - **Diagnostic**: Weighted descriptive statistics only (no inference) - **--**: Not supported -.. note:: - - ``EfficientDiD`` does not support ``covariates`` and ``survey_design`` - simultaneously (the DR nuisance path does not yet thread survey weights). - .. note:: ``SyntheticDiD`` with ``variance_method='placebo'`` does not support diff --git a/docs/doc-deps.yaml b/docs/doc-deps.yaml index b89c06a7..e38b418e 100644 --- a/docs/doc-deps.yaml +++ b/docs/doc-deps.yaml @@ -55,6 +55,9 @@ groups: stacked_did: - diff_diff/stacked_did.py - diff_diff/stacked_did_results.py + wooldridge: + - diff_diff/wooldridge.py + - diff_diff/wooldridge_results.py visualization: - diff_diff/visualization/__init__.py - diff_diff/visualization/_common.py @@ -328,7 +331,28 @@ sources: - path: docs/choosing_estimator.rst type: user_guide - # ── TROP (trop group) ──────────────���─────────────────────────────── + # ── WooldridgeDiD (wooldridge group) ──────────────────────────────── + + diff_diff/wooldridge.py: + drift_risk: medium + docs: + - path: docs/methodology/REGISTRY.md + section: "WooldridgeDiD" + type: methodology + - path: docs/api/wooldridge_etwfe.rst + type: api_reference + - path: docs/tutorials/16_wooldridge_etwfe.ipynb + type: tutorial + - path: README.md + section: "WooldridgeDiD" + type: user_guide + - path: docs/llms-full.txt + section: "WooldridgeDiD" + type: user_guide + - path: docs/choosing_estimator.rst + type: user_guide + + # ── TROP (trop group) ────────────────────────────────────────────── diff_diff/trop.py: drift_risk: medium diff --git a/docs/llms-full.txt b/docs/llms-full.txt index d45b2c8c..6f6e1a6b 100644 --- a/docs/llms-full.txt +++ b/docs/llms-full.txt @@ -660,6 +660,121 @@ results.print_summary() plot_bacon(results) ``` +### StaggeredTripleDifference + +Ortiz-Villavicencio & Sant'Anna (2025) staggered DDD estimator for designs with two eligibility criteria and staggered treatment timing. + +```python +StaggeredTripleDifference( + estimation_method: str = "dr", # "dr", "ipw", or "reg" + control_group: str = "notyettreated", # "nevertreated" or "notyettreated" + alpha: float = 0.05, + anticipation: int = 0, + base_period: str = "varying", # "varying" or "universal" + n_bootstrap: int = 0, + bootstrap_weights: str = "rademacher", + seed: int | None = None, + cband: bool = True, + pscore_trim: float = 0.01, + cluster: str | None = None, + rank_deficient_action: str = "warn", +) +``` + +**Alias:** `SDDD` + +**fit() parameters:** + +```python +sddd.fit( + data: pd.DataFrame, + outcome: str, + unit: str, + time: str, + first_treat: str, + eligibility: str, # Binary eligibility column (0/1) + covariates: list[str] | None = None, + aggregate: str = "event_study", # "simple", "group", "event_study" + survey_design=None, +) -> StaggeredTripleDiffResults +``` + +**Usage:** + +```python +from diff_diff import StaggeredTripleDifference + +sddd = StaggeredTripleDifference(estimation_method='dr', n_bootstrap=999) +results = sddd.fit(data, outcome='y', unit='id', time='t', + first_treat='ft', eligibility='eligible', + aggregate='event_study') +results.print_summary() +``` + +### WooldridgeDiD + +Wooldridge (2023, 2025) Extended Two-Way Fixed Effects (ETWFE) estimator. Supports OLS, logit, and Poisson QMLE paths with Average Structural Function (ASF) based ATT and delta-method standard errors. + +```python +WooldridgeDiD( + method: str = "ols", # "ols", "logit", or "poisson" + control_group: str = "not_yet_treated", # "not_yet_treated" or "never_treated" + anticipation: int = 0, # Number of anticipation periods + demean_covariates: bool = True, # Demean covariates within cohort*period cells + alpha: float = 0.05, + cluster: str | None = None, + n_bootstrap: int = 0, + bootstrap_weights: str = "rademacher", + seed: int | None = None, + rank_deficient_action: str = "warn", +) +``` + +**Alias:** `ETWFE` + +**fit() parameters:** + +```python +etwfe.fit( + data: pd.DataFrame, + outcome: str, + unit: str, + time: str, + cohort: str, # First treatment period (0 or NaN = never treated) + exovar: list[str] | None = None, # Time-invariant covariates (no interaction/demeaning) + xtvar: list[str] | None = None, # Time-varying covariates (demeaned within cohort*period) + xgvar: list[str] | None = None, # Covariates interacted with each cohort indicator +) -> WooldridgeDiDResults +``` + +**Aggregation types:** + +```python +results.aggregate("simple") # Overall ATT +results.aggregate("group") # ATT by cohort +results.aggregate("calendar") # ATT by calendar period +results.aggregate("event") # ATT by event time (relative to treatment) +``` + +**Usage:** + +```python +from diff_diff import WooldridgeDiD + +# OLS (linear outcomes) +etwfe = WooldridgeDiD(method='ols') +results = etwfe.fit(data, outcome='y', unit='id', time='t', cohort='first_treat') +print(results.aggregate("simple")) + +# Logit (binary outcomes) +etwfe_logit = WooldridgeDiD(method='logit') +results = etwfe_logit.fit(data, outcome='y_bin', unit='id', time='t', cohort='first_treat') + +# Poisson (count outcomes) +etwfe_pois = WooldridgeDiD(method='poisson') +results = etwfe_pois.fit(data, outcome='y_count', unit='id', time='t', cohort='first_treat') +``` + ### Convenience Functions ```python @@ -1405,6 +1520,61 @@ data = load_card_krueger(force_download=True) clear_cache() ``` +## Survey Support + +All estimators accept an optional `survey_design` parameter in `fit()`. Pass a `SurveyDesign` object to get design-based variance estimation. + +```python +from diff_diff import SurveyDesign, CallawaySantAnna + +# Create survey design with strata, PSU, FPC +sd = SurveyDesign( + weights='weight', # Sampling weight column + strata='stratum', # Stratification variable + psu='psu', # Primary Sampling Unit (cluster) + fpc='fpc', # Finite Population Correction + weight_type='pweight', # "pweight", "fweight", or "aweight" + nest=False, # PSU IDs nested within strata? + lonely_psu='remove', # "remove", "certainty", or "adjust" +) + +# Fit with survey design +cs = CallawaySantAnna(estimation_method='dr') +results = cs.fit(data, outcome='y', unit='id', time='t', + first_treat='ft', survey_design=sd) + +# Survey metadata on results +print(results.survey_metadata.deff) # Design effect +print(results.survey_metadata.effective_n) # Effective sample size +print(results.survey_metadata.df_survey) # Survey degrees of freedom +``` + +**Replicate weight designs:** + +```python +sd = SurveyDesign( + weights='weight', + replicate_weights=['rep_0', 'rep_1', ..., 'rep_79'], # Column names + replicate_method='SDR', # "BRR", "Fay", "JK1", "JKn", or "SDR" +) +``` + +**Subpopulation analysis:** + +```python +sd_female = sd.subpopulation(data, mask=lambda df: df['sex'] == 'F') +``` + +**Key features:** +- Taylor Series Linearization (TSL) variance with strata + PSU + FPC +- Replicate weight variance: BRR, Fay's BRR, JK1, JKn, SDR (12 of 15 estimators) +- Survey-aware bootstrap: multiplier at PSU or Rao-Wu rescaled +- DEFF diagnostics, subpopulation analysis, weight trimming (`trim_weights`) +- Repeated cross-sections: `CallawaySantAnna(panel=False)` +- Compatibility matrix: see `docs/choosing_estimator.rst` Survey Design Support section + +No R or Python package offers design-based variance estimation for modern heterogeneity-robust DiD estimators. + ## Linear Algebra Helpers ```python diff --git a/docs/llms-practitioner.txt b/docs/llms-practitioner.txt index 08a682e8..53803eec 100644 --- a/docs/llms-practitioner.txt +++ b/docs/llms-practitioner.txt @@ -146,7 +146,8 @@ Is treatment adoption staggered (multiple cohorts, different timing)? | |-- ImputationDiD (BJS) -- most efficient under homogeneous effects | |-- TwoStageDiD (Gardner) -- two-stage with GMM variance | |-- StackedDiD (Stacked) -- sub-experiment approach -| \-- EfficientDiD (EDiD) -- optimal weighting for tighter SEs +| |-- EfficientDiD (EDiD) -- optimal weighting for tighter SEs +| \-- WooldridgeDiD (ETWFE) -- nonlinear outcomes (logit/Poisson) or saturated OLS | |-- NO, simple 2x2 design: | \-- DifferenceInDifferences (DiD) @@ -160,6 +161,9 @@ Is treatment adoption staggered (multiple cohorts, different timing)? |-- Continuous treatment (doses)? | \-- ContinuousDiD (CDiD) | +|-- Nonlinear outcome (binary, count)? +| \-- WooldridgeDiD (ETWFE) -- logit or Poisson QMLE with ASF-based ATT +| \-- Two eligibility criteria? \-- TripleDifference (DDD) ``` diff --git a/docs/llms.txt b/docs/llms.txt index fced72b7..44fda813 100644 --- a/docs/llms.txt +++ b/docs/llms.txt @@ -2,7 +2,7 @@ > A Python library for Difference-in-Differences (DiD) causal inference analysis. Provides sklearn-like estimators with statsmodels-style summary output for econometric analysis. -diff-diff offers 14 estimators covering basic 2x2 DiD, modern staggered adoption methods, advanced panel estimators, and diagnostic tools. It supports robust and cluster-robust standard errors, wild cluster bootstrap, formula and column-name interfaces, fixed effects (dummy and absorbed), and publication-ready output. The optional Rust backend accelerates compute-intensive estimators like Synthetic DiD and TROP. +diff-diff offers 15 estimators covering basic 2x2 DiD, modern staggered adoption methods, advanced panel estimators, nonlinear models, and diagnostic tools. It supports robust and cluster-robust standard errors, wild cluster bootstrap, formula and column-name interfaces, fixed effects (dummy and absorbed), complex survey designs (strata/PSU/FPC, replicate weights, design-based variance), and publication-ready output. The optional Rust backend accelerates compute-intensive estimators like Synthetic DiD and TROP. - Install: `pip install diff-diff` - License: MIT @@ -63,6 +63,8 @@ Full practitioner guide: docs/llms-practitioner.txt - [StackedDiD](https://diff-diff.readthedocs.io/en/stable/api/stacked_did.html): Wing, Freedman & Hollingsworth (2024) stacked DiD with Q-weights and sub-experiments - [EfficientDiD](https://diff-diff.readthedocs.io/en/stable/api/efficient_did.html): Chen, Sant'Anna & Xie (2025) efficient DiD with optimal weighting for tighter SEs - [TROP](https://diff-diff.readthedocs.io/en/stable/api/trop.html): Triply Robust Panel estimator (Athey et al. 2025) with nuclear norm factor adjustment +- [StaggeredTripleDifference](https://diff-diff.readthedocs.io/en/stable/api/staggered_triple_diff.html): Ortiz-Villavicencio & Sant'Anna (2025) staggered DDD with group-time ATT +- [WooldridgeDiD](https://diff-diff.readthedocs.io/en/stable/api/wooldridge_etwfe.html): Wooldridge (2023, 2025) ETWFE — OLS, logit, Poisson QMLE with ASF-based ATT. Alias: ETWFE - [BaconDecomposition](https://diff-diff.readthedocs.io/en/stable/api/bacon.html): Goodman-Bacon (2021) decomposition for diagnosing TWFE bias in staggered settings ## Diagnostics and Sensitivity Analysis @@ -90,6 +92,20 @@ Full practitioner guide: docs/llms-practitioner.txt - [13 Stacked DiD](https://diff-diff.readthedocs.io/en/stable/tutorials/13_stacked_did.html): Stacked DiD — sub-experiments, Q-weights, event windows, trimming, clean control definitions - [14 Continuous DiD](https://diff-diff.readthedocs.io/en/stable/tutorials/14_continuous_did.html): Continuous treatment DiD — dose-response curves, ATT(d), ACRT, B-splines, event study diagnostics - [15 Efficient DiD](https://diff-diff.readthedocs.io/en/stable/tutorials/15_efficient_did.html): Chen, Sant'Anna & Xie (2025) efficient DiD — optimal weighting, PT-All vs PT-Post, efficiency gains +- [16 Survey DiD](https://diff-diff.readthedocs.io/en/stable/tutorials/16_survey_did.html): Survey-weighted DiD — SurveyDesign, strata/PSU/FPC, replicate weights, subpopulation analysis, DEFF diagnostics +- [16 Wooldridge ETWFE](https://diff-diff.readthedocs.io/en/stable/tutorials/16_wooldridge_etwfe.html): Wooldridge (2023, 2025) ETWFE — OLS/logit/Poisson, ASF-based ATT, aggregation types + +## Survey Support + +All estimators accept an optional `survey_design` parameter. Pass a `SurveyDesign` object to get design-based variance estimation: + +- **Design elements**: strata, PSU, FPC, weight types (pweight/fweight/aweight), lonely PSU handling, nest +- **Variance methods**: Taylor Series Linearization (TSL), replicate weights (BRR/Fay/JK1/JKn/SDR), survey-aware bootstrap +- **Diagnostics**: DEFF per coefficient, effective n, subpopulation analysis, weight trimming, CV on estimates +- **Repeated cross-sections**: `CallawaySantAnna(panel=False)` for BRFSS, ACS, CPS +- **Compatibility matrix**: [Survey Design Support](https://diff-diff.readthedocs.io/en/stable/choosing_estimator.html#survey-design-support) + +No R or Python package offers design-based variance estimation for modern heterogeneity-robust DiD estimators. R's `did`, `fixest`, `synthdid`, and `didimputation` accept flat weight vectors only. ## Optional diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md index ed0fc046..287597b1 100644 --- a/docs/survey-roadmap.md +++ b/docs/survey-roadmap.md @@ -1,385 +1,238 @@ # Survey Data Support Roadmap This document captures the survey data support roadmap for diff-diff. -Phases 1-8f are implemented. Phase 8g (documentation-only items) is partially -addressed. Remaining deferred items are listed at the bottom. - -## Implemented (Phases 1-2) - -- **SurveyDesign** class with weights, strata, PSU, FPC, weight_type, nest, lonely_psu -- **Weighted OLS** via sqrt(w) transformation in `solve_ols()` -- **Weighted sandwich estimator** in `compute_robust_vcov()` (pweight/fweight/aweight) -- **Taylor Series Linearization** (TSL) variance with strata + PSU + FPC -- **Survey degrees of freedom**: n_PSU - n_strata -- **Base estimator integration**: DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD -- **Weighted demeaning**: `demean_by_group()` and `within_transform()` with weights -- **SurveyMetadata** in results objects with effective n, DEFF, weight_range - -## Implemented (Phase 3): OLS-Based Standalone Estimators - -| Estimator | File | Survey Support | Notes | -|-----------|------|----------------|-------| -| StackedDiD | `stacked_did.py` | pweight only | Q-weights compose multiplicatively with survey weights; TSL vcov on composed weights; fweight/aweight rejected (composition changes weight semantics) | -| SunAbraham | `sun_abraham.py` | Full | Survey weights in LinearRegression + weighted within-transform; bootstrap via Rao-Wu rescaled (Phase 6) | -| BaconDecomposition | `bacon.py` | Diagnostic | Weighted cell means, weighted within-transform, weighted group shares; no inference (diagnostic only) | -| TripleDifference | `triple_diff.py` | Full | Regression, IPW, and DR methods with weighted OLS/logit + TSL on influence functions | -| ContinuousDiD | `continuous_did.py` | Analytical | Weighted B-spline OLS + TSL on influence functions; bootstrap via multiplier at PSU (Phase 6) | -| EfficientDiD | `efficient_did.py` | Analytical | Weighted means/covariances in Omega* + TSL on EIF scores; bootstrap via multiplier at PSU (Phase 6) | - -### Phase 3 Deferred Work - -The following capabilities were deferred from Phase 3 because they depend on -Phase 5 infrastructure (bootstrap+survey interaction): - -| Estimator | Deferred Capability | Blocker | -|-----------|-------------------|---------| -| SunAbraham | Pairs bootstrap + survey | **Resolved** (Phase 6, Rao-Wu rescaled) | -| ContinuousDiD | Multiplier bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) | -| EfficientDiD | Multiplier bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) | -| EfficientDiD | Covariates (DR path) + survey | **Resolved**: survey weights threaded through DR nuisance estimation | - -All blocked combinations raise `NotImplementedError` when attempted, with a -message pointing to the planned phase or describing the limitation. - -## Implemented (Phase 4): Complex Standalone Estimators + Weighted Logit - -| Estimator | File | Survey Support | Notes | -|-----------|------|----------------|-------| -| ImputationDiD | `imputation.py` | Analytical | Weighted iterative FE, weighted ATT aggregation, weighted conservative variance (Theorem 3); bootstrap via multiplier at PSU (Phase 6) | -| TwoStageDiD | `two_stage.py` | Analytical | Weighted iterative FE, weighted Stage 2 OLS, weighted GMM sandwich variance; bootstrap via multiplier at PSU (Phase 6) | -| CallawaySantAnna | `staggered.py` | Full | Full SurveyDesign (strata/PSU/FPC/replicate weights); reg supports covariates, IPW/DR supports covariates (Phase 7a); survey-weighted WIF in aggregation; replicate IF variance for analytical SEs | - -**Infrastructure**: Weighted `solve_logit()` added to `linalg.py` — survey weights -enter the IRLS working weights as `w_survey * mu * (1 - mu)`. This also unblocked -TripleDifference IPW/DR from Phase 3 deferred work. - -### Phase 4 Deferred Work - -| Estimator | Deferred Capability | Blocker | -|-----------|-------------------|---------| -| ImputationDiD | Bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) | -| TwoStageDiD | Bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) | -| CallawaySantAnna | Bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) | -| CallawaySantAnna | Strata/PSU/FPC in SurveyDesign | **Resolved** (Phase 6, `compute_survey_if_variance()`) | -| CallawaySantAnna | Covariates + IPW/DR + survey | **Resolved** (Phase 7a, DRDID nuisance IF corrections) | -| CallawaySantAnna | Efficient DRDID nuisance IF for reg+covariates | Deferred — code uses conservative plug-in IF (see REGISTRY.md) | - -## Implemented (Phase 5): SyntheticDiD + TROP Survey Support - -| Estimator | File | Survey Support | Notes | -|-----------|------|----------------|-------| -| SyntheticDiD | `synthetic_did.py` | pweight | Both sides weighted (WLS interpretation): treated means survey-weighted, ω composed with control survey weights post-optimization. Placebo and bootstrap SE preserve survey weights. Covariates residualized with WLS. | -| TROP | `trop.py` | pweight | ATT aggregation only: population-weighted average of per-observation treatment effects. Model fitting (kernel weights, LOOCV, nuclear norm) unchanged. Rust and Python bootstrap paths both support weighted ATT. | - -### Phase 5 Deferred Work - -| Estimator | Deferred Capability | Status | -|-----------|-------------------|--------| -| SyntheticDiD | strata/PSU/FPC + survey-aware bootstrap | Resolved in Phase 6 | -| TROP | strata/PSU/FPC + survey-aware bootstrap | Resolved in Phase 6 | - -## Phase 6: Advanced Features - -### Bootstrap + Survey Interaction ✅ (2026-03-24) -Survey-aware bootstrap for all 8 bootstrap-using estimators. Two strategies: -- **Multiplier at PSU level** (CS, ImputationDiD, TwoStageDiD, ContinuousDiD, - EfficientDiD): generate multiplier weights at PSU level within strata, FPC-scaled. -- **Rao-Wu rescaled** (SunAbraham, SyntheticDiD, TROP): draw PSUs within strata, - rescale observation weights per Rao, Wu & Yue (1992). -- **CS analytical expansion**: strata/PSU/FPC now supported for aggregated SEs via - `compute_survey_if_variance()`. -- **TROP**: cross-classified pseudo-strata (survey_stratum × treatment_group). -- **Rust TROP**: pweight-only in Rust; full design falls to Python path. - -### Replicate Weight Variance ✅ (2026-03-26) -Re-run WLS for each replicate weight column, compute variance from distribution -of estimates. Supports BRR, Fay's BRR, JK1, JKn, and SDR methods. -JKn requires explicit `replicate_strata` (per-replicate stratum assignment). -- `replicate_weights`, `replicate_method`, `fay_rho` fields on SurveyDesign -- `compute_replicate_vcov()` for OLS-based estimators (re-runs WLS per replicate) -- `compute_replicate_if_variance()` for IF-based estimators (reweights IF) -- Dispatch in `LinearRegression.fit()` and `staggered_aggregation.py` -- Replicate weights mutually exclusive with strata/PSU/FPC -- Survey df = rank(replicate_weights) - 1, matching R's `survey::degf()` -- **Coverage**: 12 of 15 estimators support replicate weights. - Supported: DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD, - CallawaySantAnna, TripleDifference (analytical only), StaggeredTripleDifference, - SunAbraham, StackedDiD, ImputationDiD, TwoStageDiD, ContinuousDiD, EfficientDiD. - Rejected: SyntheticDiD, TROP (no published theory on replicate weights + - unit weight optimization / nuclear norm regularization), BaconDecomposition - (diagnostic tool, no inference). - -### DEFF Diagnostics ✅ (2026-03-26) -Per-coefficient design effects comparing survey vcov to SRS (HC1) vcov. -- `DEFFDiagnostics` dataclass with per-coefficient DEFF, effective n, SRS/survey SE -- `compute_deff_diagnostics()` standalone function -- `LinearRegression.compute_deff()` post-fit method -- Display label "Kish DEFF (weights)" for existing weight-based DEFF - -### Subpopulation Analysis ✅ (2026-03-26) -`SurveyDesign.subpopulation(data, mask)` — zero-out weights for excluded -observations while preserving the full design structure for correct variance -estimation (unlike simple subsetting, which would drop design information). -- Mask: bool array/Series, column name, or callable -- Returns (SurveyDesign, DataFrame) pair with synthetic `_subpop_weight` column -- Weight validation relaxed: zero weights allowed (negative still rejected) +Phases 1-9 are complete. Phase 10 covers the credibility and announcement +readiness work still ahead. --- -## Phase 7: Completing the Survey Story - -These items close the remaining gaps that matter for practitioners using major -population surveys (ACS, CPS, BRFSS, MEPS) with modern DiD methods. Together -they make diff-diff the only package — R or Python — with full design-based -variance estimation for heterogeneity-robust DiD estimators. - -### 7a. CallawaySantAnna Covariates + IPW/DR + Survey ✅ - -**Implemented.** DRDID panel nuisance-estimation IF corrections (PS + OR) -for both survey and non-survey IPW/DR paths. Survey-weighted propensity -score estimation via `solve_logit()`, survey-weighted outcome regression, -and IFs that account for nuisance parameter estimation uncertainty under -the survey design (Sant'Anna & Zhao 2020, Theorem 3.1). - -**Reference:** Sant'Anna, P.H.C. & Zhao, J. (2020). "Doubly Robust -Difference-in-Differences Estimators." *Journal of Econometrics* 219(1). - -### 7b. Repeated Cross-Sections ✅ - -**Priority: High.** Many major surveys (BRFSS, ACS annual cross-sections, -CPS monthly) are not panels — units are not followed over time. The R `did` -package supports `panel=FALSE` for these settings. diff-diff currently -requires panel data for all staggered estimators. - -**Implemented.** `CallawaySantAnna(panel=False)` for repeated cross-section -surveys. Uses cross-sectional DRDID (Sant'Anna & Zhao 2020, Section 4): -`reg` matches `DRDID::reg_did_rc`, `dr` matches `DRDID::drdid_rc` (locally -efficient with 4 OLS fits), `ipw` matches `DRDID::std_ipw_did_rc`. Survey -weights, covariates, and all estimation methods supported. - -**Reference:** Sant'Anna, P.H.C. & Zhao, J. (2020). Sections 3 (panel) vs -4 (repeated cross-sections). Callaway, B. & Sant'Anna, P.H.C. (2021). -Section 4.1. - -### 7c. Survey-Aware DiD Tutorial ✅ - -**Implemented.** `docs/tutorials/16_survey_did.ipynb` — 35-cell tutorial -framed around a state-level preventive care program evaluated with a -stratified health survey (ACS/BRFSS-like). Covers: why survey design -matters, SurveyDesign setup, basic DiD with survey, staggered DiD -(CallawaySantAnna) with survey, replicate weights (JK1), subpopulation -analysis, DEFF diagnostics, repeated cross-sections, and a link to the -compatibility matrix in `choosing_estimator.rst`. Uses `generate_survey_did_data()` DGP function -added to `diff_diff.prep`. - -### 7d. HonestDiD with Survey Variance ✅ - -**Implemented.** Survey df and full event-study VCV from IF vectors -propagated to HonestDiD sensitivity analysis. When CallawaySantAnna is fit -with a survey design, HonestDiD uses t-distribution critical values with -survey degrees of freedom. Bootstrap/replicate designs fall back to -diagonal VCV with a warning. - -**Reference:** Rambachan, A. & Roth, J. (2023). "A More Credible Approach -to Parallel Trends." *Review of Economic Studies* 90(5). - -### 7e. StaggeredTripleDifference Survey Support ✅ - -**Implemented.** Full survey design support (pweight only) for the -Ortiz-Villavicencio & Sant'Anna (2025) staggered DDD estimator. Survey -weights thread through all three pairwise DiD comparisons: propensity -score estimation (weighted IRLS via `solve_logit`), outcome regression -(WLS via `solve_ols`), and Riesz representer computation (weighted -Hajek normalization). IF combination weights (w1/w2/w3) use -survey-weighted cell sizes. Per-cell SEs use the standard IF formula -(matching CallawaySantAnna); aggregated SEs (overall, event study, -group) use design-based variance via `compute_survey_if_variance()` -(TSL with strata/PSU/FPC) or `compute_replicate_if_variance()` -(replicate weights). Bootstrap uses PSU-level multiplier weights -when survey design is present. - -**Note:** The R `triplediff` package does not support survey weights. -diff-diff is the only implementation (R or Python) with design-based -variance estimation for staggered triple differences. - -**Reference:** Ortiz-Villavicencio, M. & Sant'Anna, P.H.C. (2025). -"Better Understanding Triple Differences Estimators." arXiv:2505.09942. +## What's Shipped + +### Phases 1-2: Core Infrastructure + +- `SurveyDesign` class with weights, strata, PSU, FPC, weight_type, nest, lonely_psu +- Taylor Series Linearization (TSL) variance with strata + PSU + FPC +- Weighted OLS, sandwich estimator, demeaning, survey degrees of freedom +- `SurveyMetadata` on results (effective n, DEFF, weight_range) +- Base estimators: DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD + +### Phase 3: OLS-Based Standalone Estimators + +| Estimator | Survey Support | Notes | +|-----------|----------------|-------| +| StackedDiD | pweight only | Q-weights compose multiplicatively; fweight/aweight rejected | +| SunAbraham | Full | Bootstrap via Rao-Wu rescaled | +| BaconDecomposition | Diagnostic | Weighted descriptives only, no inference | +| TripleDifference | Full | Regression, IPW, and DR methods with TSL on IFs | +| ContinuousDiD | Full | Weighted B-spline OLS + TSL; bootstrap via multiplier at PSU | +| EfficientDiD | Full | No-cov and DR covariate paths both survey-weighted; bootstrap via multiplier at PSU | + +### Phase 4: Complex Estimators + Weighted Logit + +| Estimator | Survey Support | Notes | +|-----------|----------------|-------| +| ImputationDiD | Full | Weighted iterative FE + conservative variance; bootstrap via multiplier at PSU | +| TwoStageDiD | Full | Weighted FE + GMM sandwich; bootstrap via multiplier at PSU | +| CallawaySantAnna | Full | Strata/PSU/FPC/replicate weights; IPW/DR covariates (Phase 7a); replicate IF variance | + +Weighted `solve_logit()` in `linalg.py` — survey weights enter IRLS as +`w_survey * mu * (1 - mu)`. + +### Phase 5: SyntheticDiD + TROP + +| Estimator | Survey Support | Notes | +|-----------|----------------|-------| +| SyntheticDiD | pweight | Treated means survey-weighted; omega composed with control weights post-optimization | +| TROP | pweight | Population-weighted ATT aggregation; model fitting unchanged | + +### Phase 6: Advanced Features (v2.7.6) + +- **Survey-aware bootstrap** for all 8 bootstrap-using estimators: + multiplier at PSU (CS, Imputation, TwoStage, Continuous, Efficient) + and Rao-Wu rescaled (SA, SyntheticDiD, TROP) +- **Replicate weight variance**: BRR, Fay's BRR, JK1, JKn, SDR. + 12 of 15 estimators supported (not SyntheticDiD, TROP, or BaconDecomposition) +- **DEFF diagnostics**: per-coefficient design effects vs SRS baseline +- **Subpopulation analysis**: `SurveyDesign.subpopulation()` preserves + full design structure for correct variance + +### Phase 7: Completing the Survey Story (v2.8.0-v2.8.1) + +- **7a.** CS IPW/DR covariates + survey: DRDID nuisance IF corrections + (Sant'Anna & Zhao 2020, Theorem 3.1) +- **7b.** Repeated cross-sections: `CallawaySantAnna(panel=False)` matching + `DRDID::reg_did_rc`, `drdid_rc`, `std_ipw_did_rc` +- **7c.** Survey tutorial: `docs/tutorials/16_survey_did.ipynb` with full + workflow (strata, PSU, FPC, replicates, subpopulation, DEFF) +- **7d.** HonestDiD + survey: survey df and event-study VCV propagated + to sensitivity analysis with t-distribution critical values +- **7e.** StaggeredTripleDifference survey support (only implementation + in R or Python with design-based DDD variance) + +### Phase 8: Survey Maturity (v2.8.3-v2.8.4) + +- **8a.** SDR replicate method for ACS PUMS (80 columns) +- **8b.** FPC in ImputationDiD and TwoStageDiD +- **8c.** Silent operation warnings (8 operations now emit `UserWarning`) +- **8d.** Lonely PSU "adjust" in bootstrap (Rust & Rao 1996) +- **8e.** CV on estimates, `trim_weights()`, survey-aware ImputationDiD pretrends +- **8f.** Compatibility matrix in `choosing_estimator.rst` + +### Phase 9: Real-Data Validation (v2.9.0) + +15 cross-validation tests against R's `survey` package using real federal +survey datasets: + +| Dataset | Design | Key result | +|---------|--------|------------| +| API (R `survey`) | Strata + FPC | ATT, SE, df, CI match R (7 variants incl. subpopulation, Fay's BRR) | +| NHANES (CDC/NCHS) | Strata + PSU (nest=TRUE) | ACA DiD matches R for strata+PSU, covariates, subpopulation | +| RECS 2020 (U.S. EIA) | 60 JK1 replicate weights | Coefficients, SEs, df, CI match R | + +Files: `benchmarks/R/benchmark_realdata_*.R`, `tests/test_survey_real_data.py`, +`benchmarks/data/real/*_realdata_golden.json` + +### Documentation Remaining (Phase 8g) + +- **Multi-stage design**: not yet documented. Single-stage (strata + PSU) + is sufficient per Lumley (2004) Section 2.2. +- **Post-stratification / calibration**: not yet documented. `SurveyDesign` + expects pre-calibrated weights. `samplics` is the most complete Python + option (post-stratification, raking, GREG) but is in read-only mode — + active development has moved to `svy`, which is not yet publicly + released. `weightipy` is actively maintained for raking. Weight + calibration is out of scope for diff-diff today, though building this + capability is a future possibility. --- -## Phase 8: Survey Maturity - -Refinements to close remaining gaps versus R's `survey` package and improve -practitioner experience. Prioritized by user impact. +## Phase 10: Academic Credibility and Announcement Readiness -### 8a. Successive Difference Replication (SDR) ✅ +Before broadly announcing survey capability, these items establish the +theoretical and empirical foundation needed for credibility with +practitioners and methodologists. -**Shipped in v2.8.4.** ACS PUMS — the most common US survey dataset for DiD -policy evaluation — provides 80 SDR replicate weight columns. -`SurveyDesign(replicate_method="SDR")` with variance formula -`V = 4/R * sum((theta_r - theta)^2)`. +### 10a. Theory Document (HIGH priority) -**Reference:** Fay, R.E. & Train, G.F. (1995). "Aspects of Survey and -Model-Based Postcensal Estimation of Income and Poverty Characteristics -for States and Counties." ASA Proceedings. +Write `docs/methodology/survey-theory.md` laying out the formal argument +for design-based variance estimation with modern DiD influence functions: -### 8b. FPC in ImputationDiD and TwoStageDiD ✅ +1. Modern heterogeneity-robust DiD estimators (CS, SA, BJS) are smooth + functionals of the weighted empirical distribution +2. Survey-weighted empirical distribution is design-consistent for the + superpopulation quantity (Horvitz-Thompson) +3. The influence function is a property of the functional, not the + sampling design — IFs remain valid under survey weighting +4. TSL (stratified cluster sandwich) and replicate-weight methods are + valid variance estimators for smooth functionals of survey-weighted + estimating equations (Binder 1983, Rao & Wu 1988, Shao 1996) -**Shipped in v2.8.4.** Both estimators now have full strata/PSU/FPC -support. FPC is threaded through the existing TSL variance path. +This is the short-term deliverable that can be linked from docs and README +immediately. -### 8c. Silent Operation Warnings ✅ +**Key references:** +- Binder, D.A. (1983). "On the Variances of Asymptotically Normal + Estimators from Complex Surveys." *International Statistical Review* 51. +- Rao, J.N.K. & Wu, C.F.J. (1988). "Resampling Inference with Complex + Survey Data." *JASA* 83(401). +- Shao, J. (1996). "Resampling Methods in Sample Surveys." *Statistics* 27. -**Shipped in v2.8.3.** Eight operations that previously altered analysis -results without informing the user now emit `UserWarning`: -TROP lstsq fallback, TwoStageDiD NaN masking, TwoStageDiD always-treated -removal, CallawaySantAnna (g,t) pair skipping, TROP treatment indicator -fill, Rust → Python fallback, survey weight normalization, `np.inf` → 0 -never-treated conversion. +### 10b. Survey Simulation DGP (HIGH priority) -### 8d. Lonely PSU "adjust" in Bootstrap ✅ +Build a research-grade DGP that generates realistic complex survey data +over a staggered treatment adoption panel. The existing `generate_survey_did_data()` +tests code correctness but lacks the properties needed for statistical +coverage studies and compelling tutorials. The new DGP needs: -**Shipped in v2.8.4.** `lonely_psu="adjust"` now works with survey-aware -bootstrap using Rust & Rao (1996) grand-mean centering. +- Known stratified cluster structure with varying PSU sizes +- Controllable intra-cluster correlation (so true DEFF is known) +- Known treatment effects (so coverage of 95% CIs can be measured) +- Enough design complexity to show where flat weights fail (clustering + inflates variance, stratification reduces it, FPC matters for small + populations) -**Reference:** Rust, K.F. & Rao, J.N.K. (1996). "Variance Estimation -for Complex Surveys Using Replication Techniques." Statistical Methods -in Medical Research 5(3). +This is a dependency for both 10c (tutorial) and 10d (paper simulation +study). Add to `diff_diff.prep` alongside the existing DGP functions. -### 8e. Survey Diagnostics and Utilities ✅ +### 10c. Expand R Validation Coverage (HIGH priority) -**Shipped in v2.8.4.** -- **CV on estimates**: `coef_var` property on all results objects (SE/|estimate|). - Handles edge cases (SE=0, estimate=0). -- **Weight trimming**: `trim_weights(data, weight_col, upper=None, lower=None, - quantile=None)` in `prep.py` for capping extreme survey weights. -- **ImputationDiD pretrends + survey**: pre-trends F-test now survey-aware - using subpopulation approach for correct variance under complex designs. +Current R-validated estimators: DifferenceInDifferences, TWFE, +CallawaySantAnna, SyntheticDiD (4 of 15). We can validate the OLS +regression path against R's `survey::svyglm()` for estimators that +reduce to WLS: -### 8f. Survey Compatibility Matrix ✅ +| Estimator | Validation approach | Status | +|-----------|-------------------|--------| +| ImputationDiD | Compare WLS step against `svyglm()` | Not started | +| StackedDiD | Compare stacked WLS against `svyglm()` | Not started | +| SunAbraham | Compare interaction-weighted WLS against `svyglm()` | Not started | +| TripleDifference | Compare DDD regression against `svyglm()` | Not started | +| EfficientDiD | No R reference exists | Deferred | +| TROP | No R reference exists | Deferred | -**Shipped.** Full compatibility table added to `docs/choosing_estimator.rst` -(Survey Design Support section) showing estimator × survey feature -combinations. Tutorial cross-references this table. +### 10d. Tutorial: Show the Pain (HIGH priority) -### 8g. Documentation-Only Items +Expand the survey tutorial with a side-by-side comparison using the DGP +from 10b: -**Partially addressed.** No code changes required. Remaining items -deferred to the consolidated list below: -- **Multi-stage design**: document that single-stage (strata + PSU) - is sufficient for variance estimation per Lumley (2004) Section 2.2. -- **Post-stratification / calibration**: document that `SurveyDesign` - expects pre-calibrated weights. Point users to `samplics` or R's - `survey::calibrate()` for weight calibration. +- ATT with flat weights (what R's `did` package gives you) +- ATT with full survey design (what diff-diff gives you) +- DEFF showing how much SEs were underestimated +- An example where inference conclusions change ---- +Because the DGP has known parameters, the tutorial can show not just that +the results differ, but which one is *right*. This is the content that +practitioners share and that converts skeptics. -## Phase 9: Real-Data Validation +### 10e. Position Paper / arXiv Preprint (MEDIUM priority, long-term) -Complements the synthetic-data cross-validation (Phase 6 BRR, Tier 1-3 in -`benchmark_survey_crossvalidation.R`) with real federal survey datasets. -Validates that diff-diff's survey variance matches R's `survey` package -when fed actual survey design variables from published data sources. +A 15-25 page methodology note targeting JSSAM, simultaneously posted to +arXiv. Theory (~5pp), simulation study using DGP from 10b (~8pp), +empirical illustration with NHANES ACA data (~3pp), software section +(~2pp). -### Datasets +**Ideal co-author:** Pedro Sant'Anna — derived the IFs in CS/DRDID and +can vouch they are valid under survey weighting. The survey statistics +(Binder 1983, Rao & Wu 1988) are established and don't need a survey +methodologist to co-sign. -| Dataset | Source | Design | Validates | -|---------|--------|--------|-----------| -| API (apistrat) | R `survey` package | Strata + FPC + weights | TSL variance, subpopulations, Fay's BRR | -| NHANES 2007-08 / 2015-16 | CDC/NCHS | Strata + PSU + weights (nest=TRUE) | TSL with real clustering, ACA DiD | -| RECS 2020 | U.S. EIA | 60 JK1 replicate weights | JK1 replicate weight variance | +### 10f. WooldridgeDiD Survey Support (MEDIUM priority) -### Results +WooldridgeDiD (ETWFE) is the only estimator without `survey_design` +support. The OLS path is straightforward (same as TWFE); the logit/Poisson +paths require survey-weighted IRLS. Should be added to the compatibility +matrix regardless of whether support is implemented before announcement. -- **Suite A (API):** 8 tests — ATT, SE, df, CI match R across 7 design - variants including subpopulation and Fay's BRR replicates. Known - differences: TWFE (A4) validates ATT only (SE differs due to unit FE); - subpopulation (A5) validates ATT/SE but df differs (strata preservation - vs R's `subset()` — documented deviation in REGISTRY.md). -- **Suite B (NHANES):** 4 tests — ATT, SE, df, CI match R for - strata+PSU+weights, covariates, weights-only, and female subpopulation. -- **Suite C (RECS):** 3 tests — JK1 regression coefficients, SEs, df, - and CI match R with 60 real replicate weight columns. DEFF validated - as finite/positive (different naive baselines prevent exact comparison). +### 10g. Practitioner Guidance (LOW priority) -### Files - -- R scripts: `benchmarks/R/benchmark_realdata_{api,nhanes,recs}.R` -- Download scripts: `benchmarks/scripts/download_{nhanes,recs}.py` -- Python tests: `tests/test_survey_real_data.py` -- Golden values: `benchmarks/data/real/*_realdata_golden.json` +A decision flowchart helping practitioners decide whether they need full +survey design or whether flat weights suffice. Key factors: ICC, number +of PSUs, stratification gain, DEFF magnitude. DEFF diagnostics provide +the empirical answer, but practitioners need guidance on interpretation. --- -## Deferred Work (Consolidated) - -### Documented Deviations - -These are supported paths that use a conservative or simplified approach -rather than the theoretically optimal one. They do not raise errors. - -| Estimator | Deviation | Details | -|-----------|-----------|---------| -| CallawaySantAnna | `reg`+covariates uses conservative plug-in IF | Efficient DRDID nuisance IF correction deferred; see REGISTRY.md | - -### Runtime Limitations - -All items below raise an error when attempted (`NotImplementedError` or -`ValueError` depending on the estimator), with a message describing the -limitation. See also `TODO.md` for general tech debt items (e.g., -multi-absorb + survey weights). - -### Replicate Weights Not Supported - -| Estimator | Reason | -|-----------|--------| -| SyntheticDiD | No published theory on replicate weights + unit weight optimization | -| TROP | No published theory on replicate weights + nuclear norm regularization | -| BaconDecomposition | Diagnostic tool with no inference — replicate weights don't apply | - -### EfficientDiD Survey Limitations +## Current Limitations -| Limitation | Reason | -|-----------|--------| -| `cluster` + `survey_design` | Use `survey_design` with PSU/strata instead | +All items below raise an error when attempted, with a message describing +the limitation and suggested alternative. -### Bootstrap + Replicate Weights (Mutual Exclusion) - -Replicate weights and bootstrap are alternative variance estimation methods. -Combining them raises `NotImplementedError` or `ValueError`: - -| Estimator | -|-----------| -| CallawaySantAnna | -| ContinuousDiD | -| EfficientDiD | -| ImputationDiD | -| StaggeredTripleDifference | -| SunAbraham | -| TwoStageDiD | - -### Other Limitations - -| Estimator | Limitation | Reason | -|-----------|-----------|--------| +| Estimator | Limitation | Alternative | +|-----------|-----------|-------------| +| WooldridgeDiD | No `survey_design` support | Not yet implemented (see 10f) | +| SyntheticDiD | Replicate weights | Use TSL (strata/PSU/FPC) with bootstrap | +| TROP | Replicate weights | Use TSL (strata/PSU/FPC) with bootstrap | +| BaconDecomposition | Replicate weights | Diagnostic only, no inference | | SyntheticDiD | `variance_method='placebo'` + strata/PSU/FPC | Use `variance_method='bootstrap'` | -| ImputationDiD | `pretrends=True` + replicate weights | Per-replicate lead regression refits not implemented | -| ImputationDiD | `pretrend_test()` + replicate weights | Per-replicate Equation 9 refits not implemented | -| DifferenceInDifferences, TwoWayFixedEffects | `inference='wild_bootstrap'` + `survey_design` | Raises `NotImplementedError`; use analytical survey inference (the default) instead | - -### Warning/Fallback Behaviors - -These do not raise errors but silently change behavior: - -| Estimator | Limitation | Behavior | -|-----------|-----------|----------| -| MultiPeriodDiD | `inference='wild_bootstrap'` + `survey_design` | Warns and falls back to analytical inference | - ---- - -## Remaining Documentation Tasks (Phase 8g) - -These are documentation improvements, not runtime limitations: - -- **Multi-stage design**: Document that single-stage (strata + PSU) is sufficient for variance estimation per Lumley (2004) Section 2.2. No code changes needed. -- **Post-stratification / calibration**: Document that `SurveyDesign` expects pre-calibrated weights. Point users to `samplics` or R's `survey::calibrate()` for weight calibration. +| ImputationDiD | `pretrends=True` + replicate weights | Use analytical survey design instead | +| ImputationDiD | `pretrend_test()` + replicate weights | Use analytical survey design instead | +| DiD, TWFE | `inference='wild_bootstrap'` + `survey_design` | Use analytical survey inference (default) | +| EfficientDiD | `cluster` + `survey_design` | Use `survey_design` with PSU/strata | +| All bootstrap estimators | Bootstrap + replicate weights | These are alternative variance methods; pick one | + +**Warning/fallback (no error):** MultiPeriodDiD with `wild_bootstrap` + +`survey_design` warns and falls back to analytical inference. + +**Conservative approach (no error):** CallawaySantAnna `reg`+covariates +uses conservative plug-in IF rather than efficient DRDID nuisance IF +correction (see REGISTRY.md). From 4cfd68f0512a8b880a99bbbccdcd1399cccd75e2 Mon Sep 17 00:00:00 2001 From: igerber Date: Sun, 5 Apr 2026 08:28:18 -0400 Subject: [PATCH 2/6] Address AI review: qualify survey claims, fix API drift and examples - P1: Replace blanket "all estimators" survey claims with "14 of 15" or explicit WooldridgeDiD exception in ROADMAP.md, llms.txt, llms-full.txt - P2: Fix SyntheticDiD/TROP alternative wording: "Rao-Wu rescaled bootstrap" not "TSL" in survey-roadmap.md - P2: Sync StaggeredTripleDifference snippet in llms-full.txt to live signature (add epv_threshold, pscore_fallback, balance_e; fix aggregate default) - P2: Fix survey examples: .design_effect not .deff, subpopulation() returns (SurveyDesign, DataFrame) tuple Co-Authored-By: Claude Opus 4.6 (1M context) --- ROADMAP.md | 2 +- docs/llms-full.txt | 13 ++++++++----- docs/llms.txt | 2 +- docs/survey-roadmap.md | 4 ++-- 4 files changed, 12 insertions(+), 9 deletions(-) diff --git a/ROADMAP.md b/ROADMAP.md index 68824d8d..1378e191 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -28,7 +28,7 @@ diff-diff is a **production-ready** DiD library with feature parity with R's `di ### Survey Support -`SurveyDesign` with strata, PSU, FPC, weight types (pweight/fweight/aweight), lonely PSU handling. All 15 estimators accept survey weights; design-based variance estimation varies by estimator: +`SurveyDesign` with strata, PSU, FPC, weight types (pweight/fweight/aweight), lonely PSU handling. 14 of 15 estimators accept `survey_design` (WooldridgeDiD support planned for Phase 10f); design-based variance estimation varies by estimator: - **TSL variance** (Taylor Series Linearization) with strata + PSU + FPC - **Replicate weights**: BRR, Fay's BRR, JK1, JKn, SDR — 12 of 15 estimators diff --git a/docs/llms-full.txt b/docs/llms-full.txt index 6f6e1a6b..77b26404 100644 --- a/docs/llms-full.txt +++ b/docs/llms-full.txt @@ -678,6 +678,8 @@ StaggeredTripleDifference( pscore_trim: float = 0.01, cluster: str | None = None, rank_deficient_action: str = "warn", + epv_threshold: float = 10, # Min events-per-variable for propensity score + pscore_fallback: str = "error", # "error" or "unconditional" ) ``` @@ -694,7 +696,8 @@ sddd.fit( first_treat: str, eligibility: str, # Binary eligibility column (0/1) covariates: list[str] | None = None, - aggregate: str = "event_study", # "simple", "group", "event_study" + aggregate: str | None = None, # None, "simple", "group", "event_study", "all" + balance_e: int | None = None, # Max event time for balanced event study survey_design=None, ) -> StaggeredTripleDiffResults ``` @@ -1522,7 +1525,7 @@ clear_cache() ## Survey Support -All estimators accept an optional `survey_design` parameter in `fit()`. Pass a `SurveyDesign` object to get design-based variance estimation. +All estimators except `WooldridgeDiD` accept an optional `survey_design` parameter in `fit()`. Pass a `SurveyDesign` object to get design-based variance estimation. (WooldridgeDiD survey support is planned for Phase 10f.) ```python from diff_diff import SurveyDesign, CallawaySantAnna @@ -1544,7 +1547,7 @@ results = cs.fit(data, outcome='y', unit='id', time='t', first_treat='ft', survey_design=sd) # Survey metadata on results -print(results.survey_metadata.deff) # Design effect +print(results.survey_metadata.design_effect) # Design effect print(results.survey_metadata.effective_n) # Effective sample size print(results.survey_metadata.df_survey) # Survey degrees of freedom ``` @@ -1562,7 +1565,7 @@ sd = SurveyDesign( **Subpopulation analysis:** ```python -sd_female = sd.subpopulation(data, mask=lambda df: df['sex'] == 'F') +sd_female, data_female = sd.subpopulation(data, mask=lambda df: df['sex'] == 'F') ``` **Key features:** @@ -1573,7 +1576,7 @@ sd_female = sd.subpopulation(data, mask=lambda df: df['sex'] == 'F') - Repeated cross-sections: `CallawaySantAnna(panel=False)` - Compatibility matrix: see `docs/choosing_estimator.rst` Survey Design Support section -No R or Python package offers design-based variance estimation for modern heterogeneity-robust DiD estimators. +No R or Python package offers design-based variance estimation for modern heterogeneity-robust DiD estimators. (`WooldridgeDiD` does not yet accept `survey_design` — planned for Phase 10f.) ## Linear Algebra Helpers diff --git a/docs/llms.txt b/docs/llms.txt index 44fda813..e9210a8c 100644 --- a/docs/llms.txt +++ b/docs/llms.txt @@ -105,7 +105,7 @@ All estimators accept an optional `survey_design` parameter. Pass a `SurveyDesig - **Repeated cross-sections**: `CallawaySantAnna(panel=False)` for BRFSS, ACS, CPS - **Compatibility matrix**: [Survey Design Support](https://diff-diff.readthedocs.io/en/stable/choosing_estimator.html#survey-design-support) -No R or Python package offers design-based variance estimation for modern heterogeneity-robust DiD estimators. R's `did`, `fixest`, `synthdid`, and `didimputation` accept flat weight vectors only. +No R or Python package offers design-based variance estimation for modern heterogeneity-robust DiD estimators. R's `did`, `fixest`, `synthdid`, and `didimputation` accept flat weight vectors only. (Note: `WooldridgeDiD` does not yet accept `survey_design` — planned for Phase 10f.) ## Optional diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md index 287597b1..32465c42 100644 --- a/docs/survey-roadmap.md +++ b/docs/survey-roadmap.md @@ -220,8 +220,8 @@ the limitation and suggested alternative. | Estimator | Limitation | Alternative | |-----------|-----------|-------------| | WooldridgeDiD | No `survey_design` support | Not yet implemented (see 10f) | -| SyntheticDiD | Replicate weights | Use TSL (strata/PSU/FPC) with bootstrap | -| TROP | Replicate weights | Use TSL (strata/PSU/FPC) with bootstrap | +| SyntheticDiD | Replicate weights | Use strata/PSU/FPC design with Rao-Wu rescaled bootstrap | +| TROP | Replicate weights | Use strata/PSU/FPC design with Rao-Wu rescaled bootstrap | | BaconDecomposition | Replicate weights | Diagnostic only, no inference | | SyntheticDiD | `variance_method='placebo'` + strata/PSU/FPC | Use `variance_method='bootstrap'` | | ImputationDiD | `pretrends=True` + replicate weights | Use analytical survey design instead | From 078f48e7661daabd102d3b429ce35045baf23cc1 Mon Sep 17 00:00:00 2001 From: igerber Date: Sun, 5 Apr 2026 08:39:31 -0400 Subject: [PATCH 3/6] Address AI review round 2: fix remaining survey claim inconsistencies - P1: llms.txt survey section now says "except WooldridgeDiD" at the top - P1: survey-roadmap.md replicate weight count updated to 12 of 16 (adds WooldridgeDiD to unsupported list) - P3: llms.txt estimator count corrected from 15 to 16 Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/llms.txt | 4 ++-- docs/survey-roadmap.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/llms.txt b/docs/llms.txt index e9210a8c..e29ffeb9 100644 --- a/docs/llms.txt +++ b/docs/llms.txt @@ -2,7 +2,7 @@ > A Python library for Difference-in-Differences (DiD) causal inference analysis. Provides sklearn-like estimators with statsmodels-style summary output for econometric analysis. -diff-diff offers 15 estimators covering basic 2x2 DiD, modern staggered adoption methods, advanced panel estimators, nonlinear models, and diagnostic tools. It supports robust and cluster-robust standard errors, wild cluster bootstrap, formula and column-name interfaces, fixed effects (dummy and absorbed), complex survey designs (strata/PSU/FPC, replicate weights, design-based variance), and publication-ready output. The optional Rust backend accelerates compute-intensive estimators like Synthetic DiD and TROP. +diff-diff offers 16 estimators covering basic 2x2 DiD, modern staggered adoption methods, advanced panel estimators, nonlinear models, and diagnostic tools. It supports robust and cluster-robust standard errors, wild cluster bootstrap, formula and column-name interfaces, fixed effects (dummy and absorbed), complex survey designs (strata/PSU/FPC, replicate weights, design-based variance), and publication-ready output. The optional Rust backend accelerates compute-intensive estimators like Synthetic DiD and TROP. - Install: `pip install diff-diff` - License: MIT @@ -97,7 +97,7 @@ Full practitioner guide: docs/llms-practitioner.txt ## Survey Support -All estimators accept an optional `survey_design` parameter. Pass a `SurveyDesign` object to get design-based variance estimation: +All estimators except `WooldridgeDiD` accept an optional `survey_design` parameter. Pass a `SurveyDesign` object to get design-based variance estimation: - **Design elements**: strata, PSU, FPC, weight types (pweight/fweight/aweight), lonely PSU handling, nest - **Variance methods**: Taylor Series Linearization (TSL), replicate weights (BRR/Fay/JK1/JKn/SDR), survey-aware bootstrap diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md index 32465c42..6e56db61 100644 --- a/docs/survey-roadmap.md +++ b/docs/survey-roadmap.md @@ -51,7 +51,7 @@ Weighted `solve_logit()` in `linalg.py` — survey weights enter IRLS as multiplier at PSU (CS, Imputation, TwoStage, Continuous, Efficient) and Rao-Wu rescaled (SA, SyntheticDiD, TROP) - **Replicate weight variance**: BRR, Fay's BRR, JK1, JKn, SDR. - 12 of 15 estimators supported (not SyntheticDiD, TROP, or BaconDecomposition) + 12 of 16 estimators supported (not SyntheticDiD, TROP, BaconDecomposition, or WooldridgeDiD) - **DEFF diagnostics**: per-coefficient design effects vs SRS baseline - **Subpopulation analysis**: `SurveyDesign.subpopulation()` preserves full design structure for correct variance From b6b7e694254ed4b8849548d0f2e7fbff12c0b909 Mon Sep 17 00:00:00 2001 From: igerber Date: Sun, 5 Apr 2026 08:54:06 -0400 Subject: [PATCH 4/6] Address AI review round 3: choosing_estimator survey exclusion, DDD split - P1: choosing_estimator.rst survey intro now excludes WooldridgeDiD, added WooldridgeDiD row to compatibility matrix with "--" across all columns - P1: Split DDD selection guidance into TripleDifference (2x2x2) vs StaggeredTripleDifference (staggered timing) in llms-practitioner.txt, llms-full.txt, and choosing_estimator.rst comparison table - Added WooldridgeDiD to llms-full.txt estimator selection table Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/choosing_estimator.rst | 17 ++++++++++++++--- docs/llms-full.txt | 4 +++- docs/llms-practitioner.txt | 5 +++-- 3 files changed, 20 insertions(+), 6 deletions(-) diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst index a74269cc..24f32e85 100644 --- a/docs/choosing_estimator.rst +++ b/docs/choosing_estimator.rst @@ -593,9 +593,10 @@ If you're unsure which estimator to use: Survey Design Support --------------------- -All estimators accept an optional ``survey_design`` parameter in ``fit()``. -Pass a :class:`~diff_diff.SurveyDesign` object to get design-based variance -estimation. The depth of support varies by estimator: +All estimators except :class:`~diff_diff.WooldridgeDiD` accept an optional +``survey_design`` parameter in ``fit()``. Pass a :class:`~diff_diff.SurveyDesign` +object to get design-based variance estimation. The depth of support varies by +estimator (WooldridgeDiD survey support is planned for Phase 10f): .. list-table:: :header-rows: 1 @@ -676,12 +677,22 @@ estimation. The depth of support varies by estimator: - Via bootstrap - -- - Rao-Wu rescaled + * - ``WooldridgeDiD`` + - -- + - -- + - -- + - -- * - ``BaconDecomposition`` - Diagnostic - Diagnostic - -- - -- +.. note:: + + ``WooldridgeDiD`` does not yet accept ``survey_design``. Survey support + is planned for Phase 10f. See :doc:`/survey-roadmap` for details. + **Legend:** - **Full**: All weight types (pweight/fweight/aweight) + strata/PSU/FPC + Taylor Series Linearization variance diff --git a/docs/llms-full.txt b/docs/llms-full.txt index 77b26404..a947268f 100644 --- a/docs/llms-full.txt +++ b/docs/llms-full.txt @@ -1632,7 +1632,9 @@ DIFF_DIFF_BACKEND=rust pytest # Force Rust (fail if unavailable) | Few treated units / synthetic control | `SyntheticDiD` | | Interactive fixed effects / factor confounding | `TROP` | | Continuous treatment intensity | `ContinuousDiD` | -| Two-criterion treatment (group + eligibility) | `TripleDifference` | +| Two-criterion treatment, simultaneous (2x2x2 DDD) | `TripleDifference` | +| Two-criterion treatment, staggered timing + eligibility | `StaggeredTripleDifference` | +| Nonlinear outcome (binary/count) with staggered timing | `WooldridgeDiD` | | Diagnosing TWFE bias | `BaconDecomposition` | | Efficiency-optimal estimation | `EfficientDiD` | | Corrective weighting for stacked regressions | `StackedDiD` | diff --git a/docs/llms-practitioner.txt b/docs/llms-practitioner.txt index 53803eec..d92de93e 100644 --- a/docs/llms-practitioner.txt +++ b/docs/llms-practitioner.txt @@ -164,8 +164,9 @@ Is treatment adoption staggered (multiple cohorts, different timing)? |-- Nonlinear outcome (binary, count)? | \-- WooldridgeDiD (ETWFE) -- logit or Poisson QMLE with ASF-based ATT | -\-- Two eligibility criteria? - \-- TripleDifference (DDD) +\-- Two eligibility criteria (DDD)? + |-- Simultaneous treatment (2x2x2): TripleDifference (DDD) + \-- Staggered timing + eligibility: StaggeredTripleDifference (SDDD) ``` Always run BaconDecomposition first if using TWFE, to check for negative From 67714b26723227d984ebcdeacfe7711a803509fc Mon Sep 17 00:00:00 2001 From: igerber Date: Sun, 5 Apr 2026 09:01:36 -0400 Subject: [PATCH 5/6] Address AI review round 4: add DDD flowchart branch, fix denominators - P1: Add DDD question as step 0 in choosing_estimator.rst flowchart, routing to TripleDifference (2x2x2) or StaggeredTripleDifference (staggered). Renumber subsequent steps. - P3: Standardize estimator counts to 16 total across ROADMAP.md, llms-full.txt, and survey-roadmap.md. Replicate support is 12 of 16, survey_design support is 15 of 16. Co-Authored-By: Claude Opus 4.6 (1M context) --- ROADMAP.md | 4 ++-- docs/choosing_estimator.rst | 20 +++++++++++++------- docs/llms-full.txt | 2 +- 3 files changed, 16 insertions(+), 10 deletions(-) diff --git a/ROADMAP.md b/ROADMAP.md index 1378e191..2ed3385f 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -28,10 +28,10 @@ diff-diff is a **production-ready** DiD library with feature parity with R's `di ### Survey Support -`SurveyDesign` with strata, PSU, FPC, weight types (pweight/fweight/aweight), lonely PSU handling. 14 of 15 estimators accept `survey_design` (WooldridgeDiD support planned for Phase 10f); design-based variance estimation varies by estimator: +`SurveyDesign` with strata, PSU, FPC, weight types (pweight/fweight/aweight), lonely PSU handling. 15 of 16 estimators accept `survey_design` (WooldridgeDiD support planned for Phase 10f); design-based variance estimation varies by estimator: - **TSL variance** (Taylor Series Linearization) with strata + PSU + FPC -- **Replicate weights**: BRR, Fay's BRR, JK1, JKn, SDR — 12 of 15 estimators +- **Replicate weights**: BRR, Fay's BRR, JK1, JKn, SDR — 12 of 16 estimators (not SyntheticDiD, TROP, BaconDecomposition, or WooldridgeDiD) - **Survey-aware bootstrap**: multiplier at PSU (IF-based) and Rao-Wu rescaled (resampling-based) - **DEFF diagnostics**, **subpopulation analysis**, **weight trimming**, **CV on estimates** - **Repeated cross-sections**: `CallawaySantAnna(panel=False)` for BRFSS, ACS, CPS diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst index 24f32e85..e76efa40 100644 --- a/docs/choosing_estimator.rst +++ b/docs/choosing_estimator.rst @@ -12,30 +12,36 @@ Decision Flowchart Start here and follow the questions: -0. **Is treatment continuous?** (Units receive different doses or intensities) +0. **Is this a triple-difference (DDD) design?** (Two criteria for treatment: e.g., policy adoption AND group eligibility) - **No** → Go to question 1 - - **Yes** → Use :class:`~diff_diff.ContinuousDiD` + - **Yes, simultaneous treatment (2×2×2)** → Use :class:`~diff_diff.TripleDifference` + - **Yes, with staggered timing** → Use :class:`~diff_diff.StaggeredTripleDifference` -1. **Is treatment staggered?** (Different units treated at different times) +1. **Is treatment continuous?** (Units receive different doses or intensities) - **No** → Go to question 2 + - **Yes** → Use :class:`~diff_diff.ContinuousDiD` + +2. **Is treatment staggered?** (Different units treated at different times) + + - **No** → Go to question 3 - **Yes** → Use :class:`~diff_diff.CallawaySantAnna` (or :class:`~diff_diff.EfficientDiD` for tighter SEs under PT-All) - **Yes, and you suspect homogeneous effects** → Use :class:`~diff_diff.ImputationDiD` or :class:`~diff_diff.TwoStageDiD` for tighter CIs - **Yes, with nonlinear outcome (binary/count)** → Use :class:`~diff_diff.WooldridgeDiD` with ``method='logit'`` or ``method='poisson'`` - **Want to diagnose TWFE bias?** → Use :class:`~diff_diff.BaconDecomposition` first -2. **Do you have panel data?** (Multiple observations per unit over time) +3. **Do you have panel data?** (Multiple observations per unit over time) - **No** → Use :class:`~diff_diff.DifferenceInDifferences` (basic 2x2) - - **Yes** → Go to question 3 + - **Yes** → Go to question 4 -3. **Do you need period-specific effects?** (Event study design) +4. **Do you need period-specific effects?** (Event study design) - **No** → Use :class:`~diff_diff.TwoWayFixedEffects` - **Yes** → Use :class:`~diff_diff.MultiPeriodDiD` -4. **Is your treated group small?** (Few treated units, many controls) +5. **Is your treated group small?** (Few treated units, many controls) - Consider :class:`~diff_diff.SyntheticDiD` for better pre-treatment fit diff --git a/docs/llms-full.txt b/docs/llms-full.txt index a947268f..ebb4de4a 100644 --- a/docs/llms-full.txt +++ b/docs/llms-full.txt @@ -1570,7 +1570,7 @@ sd_female, data_female = sd.subpopulation(data, mask=lambda df: df['sex'] == 'F' **Key features:** - Taylor Series Linearization (TSL) variance with strata + PSU + FPC -- Replicate weight variance: BRR, Fay's BRR, JK1, JKn, SDR (12 of 15 estimators) +- Replicate weight variance: BRR, Fay's BRR, JK1, JKn, SDR (12 of 16 estimators) - Survey-aware bootstrap: multiplier at PSU or Rao-Wu rescaled - DEFF diagnostics, subpopulation analysis, weight trimming (`trim_weights`) - Repeated cross-sections: `CallawaySantAnna(panel=False)` From 7736619592b227e5fb9132aaa0b907cb558e3b6f Mon Sep 17 00:00:00 2001 From: igerber Date: Sun, 5 Apr 2026 09:43:48 -0400 Subject: [PATCH 6/6] Address AI review round 5: OLS vs ASF distinction, DDD tree ordering MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - P1: WooldridgeDiD OLS described as direct saturated-regression coefficients, ASF only for logit/Poisson — fixed across ROADMAP.md, choosing_estimator.rst, llms-full.txt, llms.txt - P1: Practitioner decision tree now checks DDD before staggered/simple-2x2, matching choosing_estimator.rst flowchart - P2: Fix dead SDDD API link in llms.txt (point to api/index.html) Co-Authored-By: Claude Opus 4.6 (1M context) --- ROADMAP.md | 2 +- docs/choosing_estimator.rst | 2 +- docs/llms-full.txt | 2 +- docs/llms-practitioner.txt | 18 +++++++++--------- docs/llms.txt | 6 +++--- 5 files changed, 15 insertions(+), 15 deletions(-) diff --git a/ROADMAP.md b/ROADMAP.md index 2ed3385f..495160df 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -16,7 +16,7 @@ diff-diff is a **production-ready** DiD library with feature parity with R's `di - **Heterogeneity-robust**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), Borusyak-Jaravel-Spiess Imputation (2024), Two-Stage DiD (Gardner 2022), Stacked DiD (Wing et al. 2024) - **Specialized**: Synthetic DiD (Arkhangelsky et al. 2021), Triple Difference, Staggered Triple Difference (Ortiz-Villavicencio & Sant'Anna 2025), Continuous DiD (Callaway, Goodman-Bacon & Sant'Anna 2024), TROP - **Efficient**: EfficientDiD (Chen, Sant'Anna & Xie 2025) — semiparametrically efficient with doubly robust covariates -- **Nonlinear**: WooldridgeDiD / ETWFE (Wooldridge 2023, 2025) — OLS, logit, and Poisson QMLE with ASF-based ATT and delta-method SEs +- **Nonlinear**: WooldridgeDiD / ETWFE (Wooldridge 2023, 2025) — saturated OLS (direct cohort x time coefficients), logit, and Poisson QMLE (ASF-based ATT with delta-method SEs) ### Inference & Diagnostics diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst index e76efa40..77a8fd26 100644 --- a/docs/choosing_estimator.rst +++ b/docs/choosing_estimator.rst @@ -115,7 +115,7 @@ Quick Reference * - ``WooldridgeDiD`` - Nonlinear outcomes or saturated OLS - Conditional parallel trends - - ASF-based ATT (OLS, logit, Poisson) + - OLS: direct coefficients; logit/Poisson: ASF-based ATT * - ``BaconDecomposition`` - TWFE diagnostic - (diagnostic tool) diff --git a/docs/llms-full.txt b/docs/llms-full.txt index ebb4de4a..1191f25b 100644 --- a/docs/llms-full.txt +++ b/docs/llms-full.txt @@ -716,7 +716,7 @@ results.print_summary() ### WooldridgeDiD -Wooldridge (2023, 2025) Extended Two-Way Fixed Effects (ETWFE) estimator. Supports OLS, logit, and Poisson QMLE paths with Average Structural Function (ASF) based ATT and delta-method standard errors. +Wooldridge (2023, 2025) Extended Two-Way Fixed Effects (ETWFE) estimator. OLS path uses direct saturated-regression coefficients for ATT(g,t). Logit and Poisson QMLE paths use Average Structural Function (ASF) based ATT with delta-method standard errors. ```python WooldridgeDiD( diff --git a/docs/llms-practitioner.txt b/docs/llms-practitioner.txt index d92de93e..194b68f7 100644 --- a/docs/llms-practitioner.txt +++ b/docs/llms-practitioner.txt @@ -139,6 +139,13 @@ large a violation would need to be to overturn your results. Use this decision tree to select the appropriate estimator: ``` +Is this a triple-difference (DDD) design? (Two criteria: e.g., policy + eligibility) +|-- YES, simultaneous (2x2x2): TripleDifference (DDD) +|-- YES, staggered timing: StaggeredTripleDifference (SDDD) +| +Is treatment continuous (doses/intensities)? +|-- YES: ContinuousDiD (CDiD) +| Is treatment adoption staggered (multiple cohorts, different timing)? |-- YES: Do NOT use plain TWFE. Use one of: | |-- CallawaySantAnna (CS) -- most general, doubly robust, recommended default @@ -158,15 +165,8 @@ Is treatment adoption staggered (multiple cohorts, different timing)? |-- Suspected factor confounding / interactive fixed effects? | \-- TROP -- triply robust with nuclear norm factor adjustment | -|-- Continuous treatment (doses)? -| \-- ContinuousDiD (CDiD) -| -|-- Nonlinear outcome (binary, count)? -| \-- WooldridgeDiD (ETWFE) -- logit or Poisson QMLE with ASF-based ATT -| -\-- Two eligibility criteria (DDD)? - |-- Simultaneous treatment (2x2x2): TripleDifference (DDD) - \-- Staggered timing + eligibility: StaggeredTripleDifference (SDDD) +\-- Nonlinear outcome (binary, count)? + \-- WooldridgeDiD (ETWFE) -- logit or Poisson QMLE with ASF-based ATT ``` Always run BaconDecomposition first if using TWFE, to check for negative diff --git a/docs/llms.txt b/docs/llms.txt index e29ffeb9..45cb2a75 100644 --- a/docs/llms.txt +++ b/docs/llms.txt @@ -63,8 +63,8 @@ Full practitioner guide: docs/llms-practitioner.txt - [StackedDiD](https://diff-diff.readthedocs.io/en/stable/api/stacked_did.html): Wing, Freedman & Hollingsworth (2024) stacked DiD with Q-weights and sub-experiments - [EfficientDiD](https://diff-diff.readthedocs.io/en/stable/api/efficient_did.html): Chen, Sant'Anna & Xie (2025) efficient DiD with optimal weighting for tighter SEs - [TROP](https://diff-diff.readthedocs.io/en/stable/api/trop.html): Triply Robust Panel estimator (Athey et al. 2025) with nuclear norm factor adjustment -- [StaggeredTripleDifference](https://diff-diff.readthedocs.io/en/stable/api/staggered_triple_diff.html): Ortiz-Villavicencio & Sant'Anna (2025) staggered DDD with group-time ATT -- [WooldridgeDiD](https://diff-diff.readthedocs.io/en/stable/api/wooldridge_etwfe.html): Wooldridge (2023, 2025) ETWFE — OLS, logit, Poisson QMLE with ASF-based ATT. Alias: ETWFE +- [StaggeredTripleDifference](https://diff-diff.readthedocs.io/en/stable/api/index.html): Ortiz-Villavicencio & Sant'Anna (2025) staggered DDD with group-time ATT +- [WooldridgeDiD](https://diff-diff.readthedocs.io/en/stable/api/wooldridge_etwfe.html): Wooldridge (2023, 2025) ETWFE — saturated OLS, logit/Poisson QMLE (ASF-based ATT). Alias: ETWFE - [BaconDecomposition](https://diff-diff.readthedocs.io/en/stable/api/bacon.html): Goodman-Bacon (2021) decomposition for diagnosing TWFE bias in staggered settings ## Diagnostics and Sensitivity Analysis @@ -93,7 +93,7 @@ Full practitioner guide: docs/llms-practitioner.txt - [14 Continuous DiD](https://diff-diff.readthedocs.io/en/stable/tutorials/14_continuous_did.html): Continuous treatment DiD — dose-response curves, ATT(d), ACRT, B-splines, event study diagnostics - [15 Efficient DiD](https://diff-diff.readthedocs.io/en/stable/tutorials/15_efficient_did.html): Chen, Sant'Anna & Xie (2025) efficient DiD — optimal weighting, PT-All vs PT-Post, efficiency gains - [16 Survey DiD](https://diff-diff.readthedocs.io/en/stable/tutorials/16_survey_did.html): Survey-weighted DiD — SurveyDesign, strata/PSU/FPC, replicate weights, subpopulation analysis, DEFF diagnostics -- [16 Wooldridge ETWFE](https://diff-diff.readthedocs.io/en/stable/tutorials/16_wooldridge_etwfe.html): Wooldridge (2023, 2025) ETWFE — OLS/logit/Poisson, ASF-based ATT, aggregation types +- [16 Wooldridge ETWFE](https://diff-diff.readthedocs.io/en/stable/tutorials/16_wooldridge_etwfe.html): Wooldridge (2023, 2025) ETWFE — saturated OLS, logit/Poisson (ASF-based ATT), aggregation types ## Survey Support