From ec223cd26064199e9645eda7b902d2b211ea2658 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 5 Apr 2026 08:14:39 -0400
Subject: [PATCH 1/6] Update documentation for v2.9.0: add WooldridgeDiD,
 refresh roadmap and survey docs

- ROADMAP.md: full rewrite reflecting v2.9.0 status, Phase 10 active work
- survey-roadmap.md: add Phase 10 credibility items, simplify structure, fix stale refs
- llms-full.txt: add WooldridgeDiD, StaggeredTripleDifference, survey support sections
- llms.txt: add WooldridgeDiD, SDDD, survey section, missing tutorials
- llms-practitioner.txt: add WooldridgeDiD to estimator decision tree
- choosing_estimator.rst: add WooldridgeDiD/TripleDiff/SDDD to table and flowchart,
  remove stale EfficientDiD covariates+survey note
- doc-deps.yaml: add WooldridgeDiD group and source mappings
- TODO.md: remove 8 resolved items

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 ROADMAP.md                  | 120 ++++----
 TODO.md                     |  10 -
 docs/choosing_estimator.rst |  18 +-
 docs/doc-deps.yaml          |  26 +-
 docs/llms-full.txt          | 170 +++++++++++
 docs/llms-practitioner.txt  |   6 +-
 docs/llms.txt               |  18 +-
 docs/survey-roadmap.md      | 545 +++++++++++++-----------------------
 8 files changed, 482 insertions(+), 431 deletions(-)

diff --git a/ROADMAP.md b/ROADMAP.md
index 7dfa6fb5..68824d8d 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6,62 +6,71 @@ For past changes and release history, see [CHANGELOG.md](CHANGELOG.md).
 
 ---
 
-## Current Status
+## Current Status (v2.9.0)
 
-diff-diff v2.8.4 is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis, plus **unique survey support** — all estimators accept survey weights, with design-based variance estimation varying by estimator. No R or Python package offers this combination:
+diff-diff is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis, plus **unique survey support** that no R or Python package matches.
 
-- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Borusyak-Jaravel-Spiess Imputation, Synthetic DiD, Triple Difference (DDD), Staggered Triple Difference (Ortiz-Villavicencio & Sant'Anna 2025), TROP, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing et al. 2024), Continuous DiD (Callaway, Goodman-Bacon & Sant'Anna 2024)
-- **Valid inference**: Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap, placebo-based variance
-- **Assumption diagnostics**: Parallel trends tests, placebo tests, Goodman-Bacon decomposition
-- **Sensitivity analysis**: Honest DiD (Rambachan-Roth), Pre-trends power analysis (Roth 2022)
-- **Study design**: Power analysis tools
-- **Data utilities**: Real-world datasets (Card-Krueger, Castle Doctrine, Divorce Laws, MPDTA), DGP functions for all supported designs
-- **Survey support**: `SurveyDesign` with strata, PSU, FPC, weight types, DEFF diagnostics, subpopulation analysis. All 15 estimators accept survey weights; design-based variance estimation (TSL, replicate weights, survey-aware bootstrap) varies by estimator. Replicate weights (BRR/Fay/JK1/JKn/SDR) supported for 12 of 15; `BaconDecomposition` is diagnostic-only. See [choosing_estimator.rst](docs/choosing_estimator.rst#survey-design-support) for the full compatibility matrix.
-- **Performance**: Optional Rust backend for accelerated computation; faster than R at scale (see [CHANGELOG.md](CHANGELOG.md) for benchmarks)
+### Estimators
 
----
+- **Core**: Basic DiD, TWFE, MultiPeriod event study
+- **Heterogeneity-robust**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), Borusyak-Jaravel-Spiess Imputation (2024), Two-Stage DiD (Gardner 2022), Stacked DiD (Wing et al. 2024)
+- **Specialized**: Synthetic DiD (Arkhangelsky et al. 2021), Triple Difference, Staggered Triple Difference (Ortiz-Villavicencio & Sant'Anna 2025), Continuous DiD (Callaway, Goodman-Bacon & Sant'Anna 2024), TROP
+- **Efficient**: EfficientDiD (Chen, Sant'Anna & Xie 2025) — semiparametrically efficient with doubly robust covariates
+- **Nonlinear**: WooldridgeDiD / ETWFE (Wooldridge 2023, 2025) — OLS, logit, and Poisson QMLE with ASF-based ATT and delta-method SEs
+
+### Inference & Diagnostics
 
-## Near-Term Enhancements (v2.8)
+- Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap, placebo-based variance
+- Parallel trends tests, placebo tests, Goodman-Bacon decomposition
+- Honest DiD sensitivity analysis (Rambachan & Roth 2023), pre-trends power analysis (Roth 2022)
+- Power analysis and simulation-based MDE tools
+- EPV diagnostics for propensity score estimation
 
-### Survey Phase 7: Completing the Survey Story
+### Survey Support
 
-Close the remaining gaps for practitioners using major population surveys
-(ACS, CPS, BRFSS, MEPS). See [survey-roadmap.md](docs/survey-roadmap.md) for
-full details.
+`SurveyDesign` with strata, PSU, FPC, weight types (pweight/fweight/aweight), lonely PSU handling. All 15 estimators accept survey weights; design-based variance estimation varies by estimator:
 
-- **CS Covariates + IPW/DR + Survey** *(Implemented)*: DRDID nuisance IF
-  corrections (PS + OR) under survey weights for all estimation methods.
-- **Repeated Cross-Sections** *(Implemented)*: `panel=False` support for
-  CallawaySantAnna using cross-sectional DRDID (Sant'Anna & Zhao 2020,
-  Section 4). Supports BRFSS, ACS annual, CPS monthly.
-- **Survey-Aware DiD Tutorial** *(Implemented)*: Jupyter notebook demonstrating
-  the full workflow with realistic survey data.
-- **HonestDiD + Survey Variance** *(Implemented)*: Survey df and full
-  event-study VCV propagated to sensitivity analysis, with bootstrap/replicate
-  diagonal fallback.
+- **TSL variance** (Taylor Series Linearization) with strata + PSU + FPC
+- **Replicate weights**: BRR, Fay's BRR, JK1, JKn, SDR — 12 of 15 estimators
+- **Survey-aware bootstrap**: multiplier at PSU (IF-based) and Rao-Wu rescaled (resampling-based)
+- **DEFF diagnostics**, **subpopulation analysis**, **weight trimming**, **CV on estimates**
+- **Repeated cross-sections**: `CallawaySantAnna(panel=False)` for BRFSS, ACS, CPS
+- **R cross-validation**: 15 tests against R's `survey` package using NHANES, RECS, and API datasets
 
-### Staggered Triple Difference (DDD) *(Implemented)*
+See [Survey Design Support](docs/choosing_estimator.rst#survey-design-support) for the full compatibility matrix, and [survey-roadmap.md](docs/survey-roadmap.md) for implementation details.
 
-`StaggeredTripleDifference` estimator for staggered adoption DDD settings.
+**Gap**: WooldridgeDiD does not yet accept `survey_design`. Planned for Phase 10f.
 
-- Group-time ATT(g,t) for DDD designs with variation in treatment timing
-- Event study aggregation and pre-treatment placebo effects
-- Multiplier bootstrap for valid inference in staggered settings
-- Full survey support (pweight, strata/PSU/FPC, replicate weights)
+### Infrastructure
 
-**Reference**: [Ortiz-Villavicencio & Sant'Anna (2025)](https://arxiv.org/abs/2505.09942). "Better Understanding Triple Differences Estimators." *Working Paper*. R package: `triplediff`.
+- Optional Rust backend for accelerated computation
+- Label-gated CI (tests run only when `ready-for-ci` label is added)
+- Documentation dependency map (`docs/doc-deps.yaml`) with `/docs-impact` skill
+- AI practitioner guardrails based on Baker et al. (2025) 8-step workflow
 
 ---
 
-## Medium-Term Enhancements
+## Active Work: Survey Academic Credibility (Phase 10)
 
-### Efficient DiD Estimators
+Before broadly announcing survey capability, we are establishing the theoretical
+and empirical foundation needed for credibility with practitioners and
+methodologists. See [survey-roadmap.md](docs/survey-roadmap.md) for detailed specs.
 
-Semiparametrically efficient versions of existing DiD/event-study estimators with 40%+ precision gains over current methods.
+| Item | Priority | Status |
+|------|----------|--------|
+| **10a.** Theory document (`survey-theory.md`) | HIGH | Not started |
+| **10b.** Research-grade survey DGP (enhance `generate_survey_did_data`) | HIGH | Not started |
+| **10c.** Expand R validation (ImputationDiD, StackedDiD, SunAbraham, TripleDifference) | HIGH | Not started |
+| **10d.** Tutorial: flat-weight vs design-based comparison | HIGH | Not started — depends on 10b |
+| **10e.** Position paper / arXiv preprint | MEDIUM | Not started — depends on 10b |
+| **10f.** WooldridgeDiD survey support (OLS + logit + Poisson) | MEDIUM | Not started |
+| **10g.** Practitioner guidance: when does survey design matter? | LOW | Not started |
+
+---
 
-**Reference**: [Chen, Sant'Anna & Xie (2025)](https://arxiv.org/abs/2506.17729). *Working Paper*.
+## Future Estimators
 
-### de Chaisemartin-D'Haultfœuille Estimator
+### de Chaisemartin-D'Haultfouille Estimator
 
 Handles treatment that switches on and off (reversible treatments), unlike most other methods.
 
@@ -69,7 +78,7 @@ Handles treatment that switches on and off (reversible treatments), unlike most
 - Time-varying, heterogeneous treatment effects
 - Comparison with never-switchers or flexible control groups
 
-**Reference**: [de Chaisemartin & D'Haultfœuille (2020, 2024)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3980758). *American Economic Review*.
+**Reference**: [de Chaisemartin & D'Haultfouille (2020, 2024)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3980758). *American Economic Review*.
 
 ### Local Projections DiD
 
@@ -79,13 +88,7 @@ Implements local projections for dynamic treatment effects. Doesn't require spec
 - Robust to misspecification of dynamics
 - Natural handling of anticipation effects
 
-**Reference**: Dube, Girardi, Jordà, and Taylor (2023).
-
-### Nonlinear DiD
-
-Implemented in `WooldridgeDiD` (alias `ETWFE`) — OLS, Poisson QMLE, and logit paths with ASF-based ATT. See [Tutorial 16](docs/tutorials/16_wooldridge_etwfe.ipynb).
-
-**Reference**: [Wooldridge (2023)](https://academic.oup.com/ectj/article/26/3/C31/7250479). *The Econometrics Journal*.
+**Reference**: Dube, Girardi, Jorda, and Taylor (2023).
 
 ### Causal Duration Analysis with DiD
 
@@ -98,7 +101,7 @@ Extends DiD to duration/survival outcomes where standard methods fail (hazard ra
 
 ---
 
-## Long-Term Research Directions (v3.0+)
+## Long-Term Research Directions
 
 Frontier methods requiring more research investment.
 
@@ -110,12 +113,11 @@ Standard DiD assumes SUTVA; spatial/network spillovers violate this. Two-stage i
 
 ### Quantile/Distributional DiD
 
-Recover the full counterfactual distribution and quantile treatment effects (QTT), not just mean ATT. Goes beyond "what's the average effect" to "who gains, who loses."
+Recover the full counterfactual distribution and quantile treatment effects (QTT), not just mean ATT.
 
 - Changes-in-Changes (CiC) identification strategy
-- QTT(τ) at user-specified quantiles
+- QTT(tau) at user-specified quantiles
 - Full counterfactual distribution function
-- Two-period foundation, then staggered extension
 
 **Reference**: [Athey & Imbens (2006)](https://onlinelibrary.wiley.com/doi/10.1111/j.1468-0262.2006.00668.x). *Econometrica*.
 
@@ -129,10 +131,6 @@ ML-powered conditional ATT — discover who benefits most from treatment using d
 
 Machine learning methods for discovering heterogeneous treatment effects in DiD settings.
 
-- Estimate treatment effect heterogeneity across covariates
-- Data-driven subgroup discovery
-- Honest confidence intervals for discovered heterogeneity
-
 **References**:
 - [Kattenberg, Scheer & Thiel (2023)](https://ideas.repec.org/p/cpb/discus/452.html). *CPB Discussion Paper*.
 - Athey & Wager (2019). *Annals of Statistics*.
@@ -141,18 +139,12 @@ Machine learning methods for discovering heterogeneous treatment effects in DiD
 
 Unified framework encompassing synthetic control and regression approaches.
 
-- Nuclear norm regularization for low-rank structure
-- Bridges synthetic control (few units, many periods) and regression (many units, few periods)
-
 **Reference**: [Athey et al. (2021)](https://arxiv.org/abs/1710.10251). *Journal of the American Statistical Association*.
 
 ### Double/Debiased ML for DiD
 
 For high-dimensional settings with many potential confounders.
 
-- ML for nuisance parameter estimation (propensity, outcome models)
-- Cross-fitting for valid inference
-
 **Reference**: Chernozhukov et al. (2018). *The Econometrics Journal*.
 
 ### Alternative Inference Methods
@@ -163,15 +155,9 @@ For high-dimensional settings with many potential confounders.
 
 ---
 
-## Infrastructure Improvements
-
-- Video tutorials and worked examples
-
----
-
 ## Contributing
 
-Interested in contributing? Features in the "Near-Term" and "Medium-Term" sections are good candidates. See the [GitHub repository](https://github.com/igerber/diff-diff) for open issues.
+Interested in contributing? The Phase 10 items and future estimators are good candidates. See the [GitHub repository](https://github.com/igerber/diff-diff) for open issues.
 
 Key references for implementation:
 - [Roth et al. (2023)](https://www.sciencedirect.com/science/article/abs/pii/S0304407623001318). "What's Trending in Difference-in-Differences?" *Journal of Econometrics*.
diff --git a/TODO.md b/TODO.md
index cc1d164f..5afc6274 100644
--- a/TODO.md
+++ b/TODO.md
@@ -58,19 +58,11 @@ Deferred items from PR reviews that were not addressed before merge.
 |-------|----------|----|----------|
 | CallawaySantAnna: consider materializing NaN entries for non-estimable (g,t) cells in group_time_effects dict (currently omitted with consolidated warning); would require updating downstream consumers (event study, balance_e, aggregation) | `staggered.py` | #256 | Low |
 | ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) |
-| ImputationDiD survey pretrends: subpopulation approach implemented (full design with zero-padded scores). Resolved in #260. | `imputation.py` | #260 | Resolved |
 | Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
-| Replicate-weight survey df — **Resolved**. `df_survey = rank(replicate_weights) - 1` matching R's `survey::degf()`. For IF paths, `n_valid - 1` when dropped replicates reduce effective count. | `survey.py` | #238 | Resolved |
-| CallawaySantAnna survey: strata/PSU/FPC — **Resolved**. Aggregated SEs (overall, event study, group) use `compute_survey_if_variance()`. Bootstrap uses PSU-level multiplier weights. | `staggered.py` | #237 | Resolved |
-| CallawaySantAnna survey + covariates + IPW/DR — **Resolved**. DRDID panel nuisance IF corrections (PS + OR) implemented for both survey and non-survey DR paths (Phase 7a). IPW path unblocked. | `staggered.py` | #233 | Resolved |
-| SyntheticDiD/TROP survey: strata/PSU/FPC — **Resolved**. Rao-Wu rescaled bootstrap implemented for both. TROP uses cross-classified pseudo-strata. Rust TROP remains pweight-only (Python fallback for full design). | `synthetic_did.py`, `trop.py` | — | Resolved |
-| EfficientDiD hausman_pretest() clustered covariance stale `n_cl` — **Resolved**. Recompute `n_cl` and remap indices after `row_finite` filtering via `np.unique(return_inverse=True)`. | `efficient_did.py` | #230 | Resolved |
 | EfficientDiD `control_group="last_cohort"` trims at `last_g - anticipation` but REGISTRY says `t >= last_g`. With `anticipation=0` (default) these are identical. With `anticipation>0`, code is arguably more conservative (excludes anticipation-contaminated periods). Either align REGISTRY with code or change code to `t < last_g` — needs design decision. | `efficient_did.py` | #230 | Low |
 | TripleDifference power: `generate_ddd_data` is a fixed 2×2×2 cross-sectional DGP — no multi-period or unbalanced-group support. Add a `generate_ddd_panel_data` for panel DDD power analysis. | `prep_dgp.py`, `power.py` | #208 | Low |
-| ContinuousDiD event-study aggregation anticipation filter — **Resolved**. `_aggregate_event_study()` now filters `e < -anticipation` when `anticipation > 0`, matching CallawaySantAnna behavior. Bootstrap paths also filtered. | `continuous_did.py` | #226 | Resolved |
 | Survey design resolution/collapse patterns are inconsistent across panel estimators — ContinuousDiD rebuilds unit-level design in SE code, EfficientDiD builds once in fit(), StackedDiD re-resolves on stacked data; extract shared helpers for panel-to-unit collapse, post-filter re-resolution, and metadata recomputation | `continuous_did.py`, `efficient_did.py`, `stacked_did.py` | #226 | Low |
 | Survey-weighted Silverman bandwidth in EfficientDiD conditional Omega* — `_silverman_bandwidth()` uses unweighted mean/std for bandwidth selection; survey-weighted statistics would better reflect the population distribution but is a second-order refinement | `efficient_did_covariates.py` | — | Low |
-| Survey metadata formatting dedup — **Resolved**. Extracted `_format_survey_block()` helper in `results.py`, replaced 13 occurrences across 11 files. | `results.py` + 10 results files | — | Resolved |
 | TROP: `fit()` and `_fit_global()` share ~150 lines of near-identical data setup (panel pivoting, absorbing-state validation, first-treatment detection, effective rank, NaN warnings). Both bootstrap methods also duplicate the stratified resampling loop. Extract shared helpers to eliminate cross-file sync risk. | `trop.py`, `trop_global.py`, `trop_local.py` | — | Low |
 | StaggeredTripleDifference R cross-validation: CSV fixtures not committed (gitignored); tests skip without local R + triplediff. Commit fixtures or generate deterministically. | `tests/test_methodology_staggered_triple_diff.py` | #245 | Medium |
 | StaggeredTripleDifference R parity: benchmark only tests no-covariate path (xformla=~1). Add covariate-adjusted scenarios and aggregation SE parity assertions. | `benchmarks/R/benchmark_staggered_triplediff.R` | #245 | Medium |
@@ -184,8 +176,6 @@ Features in R's `did` package that block porting additional tests:
 
 | Feature | R tests blocked | Priority | Status |
 |---------|----------------|----------|--------|
-| Repeated cross-sections (`panel=FALSE`) | ~7 tests in test-att_gt.R + test-user_bug_fixes.R | High | **Resolved** — Phase 7b: `panel=False` on CallawaySantAnna |
-| Sampling/population weights | 7 tests incl. all JEL replication | Medium | **Resolved** (Phases 1-6 + 7a: CS IPW/DR + covariates + survey) |
 | Calendar time aggregation | 1 test in test-att_gt.R | Low | |
 
 ---
diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst
index 34bbf612..a74269cc 100644
--- a/docs/choosing_estimator.rst
+++ b/docs/choosing_estimator.rst
@@ -22,6 +22,7 @@ Start here and follow the questions:
    - **No** → Go to question 2
    - **Yes** → Use :class:`~diff_diff.CallawaySantAnna` (or :class:`~diff_diff.EfficientDiD` for tighter SEs under PT-All)
    - **Yes, and you suspect homogeneous effects** → Use :class:`~diff_diff.ImputationDiD` or :class:`~diff_diff.TwoStageDiD` for tighter CIs
+   - **Yes, with nonlinear outcome (binary/count)** → Use :class:`~diff_diff.WooldridgeDiD` with ``method='logit'`` or ``method='poisson'``
    - **Want to diagnose TWFE bias?** → Use :class:`~diff_diff.BaconDecomposition` first
 
 2. **Do you have panel data?** (Multiple observations per unit over time)
@@ -97,6 +98,18 @@ Quick Reference
      - Factor confounding suspected
      - Factor model + weights
      - ATT with triple robustness
+   * - ``TripleDifference``
+     - Two eligibility criteria (DDD)
+     - Parallel trends for both dimensions
+     - DDD ATT (regression, IPW, or DR)
+   * - ``StaggeredTripleDifference``
+     - Staggered DDD with treatment timing
+     - Conditional parallel trends (DDD)
+     - Group-time ATT(g,t), aggregations
+   * - ``WooldridgeDiD``
+     - Nonlinear outcomes or saturated OLS
+     - Conditional parallel trends
+     - ASF-based ATT (OLS, logit, Poisson)
    * - ``BaconDecomposition``
      - TWFE diagnostic
      - (diagnostic tool)
@@ -678,11 +691,6 @@ estimation. The depth of support varies by estimator:
 - **Diagnostic**: Weighted descriptive statistics only (no inference)
 - **--**: Not supported
 
-.. note::
-
-   ``EfficientDiD`` does not support ``covariates`` and ``survey_design``
-   simultaneously (the DR nuisance path does not yet thread survey weights).
-
 .. note::
 
    ``SyntheticDiD`` with ``variance_method='placebo'`` does not support
diff --git a/docs/doc-deps.yaml b/docs/doc-deps.yaml
index b89c06a7..e38b418e 100644
--- a/docs/doc-deps.yaml
+++ b/docs/doc-deps.yaml
@@ -55,6 +55,9 @@ groups:
   stacked_did:
     - diff_diff/stacked_did.py
     - diff_diff/stacked_did_results.py
+  wooldridge:
+    - diff_diff/wooldridge.py
+    - diff_diff/wooldridge_results.py
   visualization:
     - diff_diff/visualization/__init__.py
     - diff_diff/visualization/_common.py
@@ -328,7 +331,28 @@ sources:
       - path: docs/choosing_estimator.rst
         type: user_guide
 
-  # ── TROP (trop group) ──────────────���───────────────────────────────
+  # ── WooldridgeDiD (wooldridge group) ────────────────────────────────
+
+  diff_diff/wooldridge.py:
+    drift_risk: medium
+    docs:
+      - path: docs/methodology/REGISTRY.md
+        section: "WooldridgeDiD"
+        type: methodology
+      - path: docs/api/wooldridge_etwfe.rst
+        type: api_reference
+      - path: docs/tutorials/16_wooldridge_etwfe.ipynb
+        type: tutorial
+      - path: README.md
+        section: "WooldridgeDiD"
+        type: user_guide
+      - path: docs/llms-full.txt
+        section: "WooldridgeDiD"
+        type: user_guide
+      - path: docs/choosing_estimator.rst
+        type: user_guide
+
+  # ── TROP (trop group) ──────────────────────────────────────────────
 
   diff_diff/trop.py:
     drift_risk: medium
diff --git a/docs/llms-full.txt b/docs/llms-full.txt
index d45b2c8c..6f6e1a6b 100644
--- a/docs/llms-full.txt
+++ b/docs/llms-full.txt
@@ -660,6 +660,121 @@ results.print_summary()
 plot_bacon(results)
 ```
 
+### StaggeredTripleDifference
+
+Ortiz-Villavicencio & Sant'Anna (2025) staggered DDD estimator for designs with two eligibility criteria and staggered treatment timing.
+
+```python
+StaggeredTripleDifference(
+    estimation_method: str = "dr",          # "dr", "ipw", or "reg"
+    control_group: str = "notyettreated",   # "nevertreated" or "notyettreated"
+    alpha: float = 0.05,
+    anticipation: int = 0,
+    base_period: str = "varying",           # "varying" or "universal"
+    n_bootstrap: int = 0,
+    bootstrap_weights: str = "rademacher",
+    seed: int | None = None,
+    cband: bool = True,
+    pscore_trim: float = 0.01,
+    cluster: str | None = None,
+    rank_deficient_action: str = "warn",
+)
+```
+
+**Alias:** `SDDD`
+
+**fit() parameters:**
+
+```python
+sddd.fit(
+    data: pd.DataFrame,
+    outcome: str,
+    unit: str,
+    time: str,
+    first_treat: str,
+    eligibility: str,               # Binary eligibility column (0/1)
+    covariates: list[str] | None = None,
+    aggregate: str = "event_study",  # "simple", "group", "event_study"
+    survey_design=None,
+) -> StaggeredTripleDiffResults
+```
+
+**Usage:**
+
+```python
+from diff_diff import StaggeredTripleDifference
+
+sddd = StaggeredTripleDifference(estimation_method='dr', n_bootstrap=999)
+results = sddd.fit(data, outcome='y', unit='id', time='t',
+                   first_treat='ft', eligibility='eligible',
+                   aggregate='event_study')
+results.print_summary()
+```
+
+### WooldridgeDiD
+
+Wooldridge (2023, 2025) Extended Two-Way Fixed Effects (ETWFE) estimator. Supports OLS, logit, and Poisson QMLE paths with Average Structural Function (ASF) based ATT and delta-method standard errors.
+
+```python
+WooldridgeDiD(
+    method: str = "ols",                    # "ols", "logit", or "poisson"
+    control_group: str = "not_yet_treated", # "not_yet_treated" or "never_treated"
+    anticipation: int = 0,                  # Number of anticipation periods
+    demean_covariates: bool = True,         # Demean covariates within cohort*period cells
+    alpha: float = 0.05,
+    cluster: str | None = None,
+    n_bootstrap: int = 0,
+    bootstrap_weights: str = "rademacher",
+    seed: int | None = None,
+    rank_deficient_action: str = "warn",
+)
+```
+
+**Alias:** `ETWFE`
+
+**fit() parameters:**
+
+```python
+etwfe.fit(
+    data: pd.DataFrame,
+    outcome: str,
+    unit: str,
+    time: str,
+    cohort: str,                    # First treatment period (0 or NaN = never treated)
+    exovar: list[str] | None = None,  # Time-invariant covariates (no interaction/demeaning)
+    xtvar: list[str] | None = None,   # Time-varying covariates (demeaned within cohort*period)
+    xgvar: list[str] | None = None,   # Covariates interacted with each cohort indicator
+) -> WooldridgeDiDResults
+```
+
+**Aggregation types:**
+
+```python
+results.aggregate("simple")    # Overall ATT
+results.aggregate("group")     # ATT by cohort
+results.aggregate("calendar")  # ATT by calendar period
+results.aggregate("event")     # ATT by event time (relative to treatment)
+```
+
+**Usage:**
+
+```python
+from diff_diff import WooldridgeDiD
+
+# OLS (linear outcomes)
+etwfe = WooldridgeDiD(method='ols')
+results = etwfe.fit(data, outcome='y', unit='id', time='t', cohort='first_treat')
+print(results.aggregate("simple"))
+
+# Logit (binary outcomes)
+etwfe_logit = WooldridgeDiD(method='logit')
+results = etwfe_logit.fit(data, outcome='y_bin', unit='id', time='t', cohort='first_treat')
+
+# Poisson (count outcomes)
+etwfe_pois = WooldridgeDiD(method='poisson')
+results = etwfe_pois.fit(data, outcome='y_count', unit='id', time='t', cohort='first_treat')
+```
+
 ### Convenience Functions
 
 ```python
@@ -1405,6 +1520,61 @@ data = load_card_krueger(force_download=True)
 clear_cache()
 ```
 
+## Survey Support
+
+All estimators accept an optional `survey_design` parameter in `fit()`. Pass a `SurveyDesign` object to get design-based variance estimation.
+
+```python
+from diff_diff import SurveyDesign, CallawaySantAnna
+
+# Create survey design with strata, PSU, FPC
+sd = SurveyDesign(
+    weights='weight',           # Sampling weight column
+    strata='stratum',           # Stratification variable
+    psu='psu',                  # Primary Sampling Unit (cluster)
+    fpc='fpc',                  # Finite Population Correction
+    weight_type='pweight',      # "pweight", "fweight", or "aweight"
+    nest=False,                 # PSU IDs nested within strata?
+    lonely_psu='remove',        # "remove", "certainty", or "adjust"
+)
+
+# Fit with survey design
+cs = CallawaySantAnna(estimation_method='dr')
+results = cs.fit(data, outcome='y', unit='id', time='t',
+                 first_treat='ft', survey_design=sd)
+
+# Survey metadata on results
+print(results.survey_metadata.deff)         # Design effect
+print(results.survey_metadata.effective_n)  # Effective sample size
+print(results.survey_metadata.df_survey)    # Survey degrees of freedom
+```
+
+**Replicate weight designs:**
+
+```python
+sd = SurveyDesign(
+    weights='weight',
+    replicate_weights=['rep_0', 'rep_1', ..., 'rep_79'],  # Column names
+    replicate_method='SDR',     # "BRR", "Fay", "JK1", "JKn", or "SDR"
+)
+```
+
+**Subpopulation analysis:**
+
+```python
+sd_female = sd.subpopulation(data, mask=lambda df: df['sex'] == 'F')
+```
+
+**Key features:**
+- Taylor Series Linearization (TSL) variance with strata + PSU + FPC
+- Replicate weight variance: BRR, Fay's BRR, JK1, JKn, SDR (12 of 15 estimators)
+- Survey-aware bootstrap: multiplier at PSU or Rao-Wu rescaled
+- DEFF diagnostics, subpopulation analysis, weight trimming (`trim_weights`)
+- Repeated cross-sections: `CallawaySantAnna(panel=False)`
+- Compatibility matrix: see `docs/choosing_estimator.rst` Survey Design Support section
+
+No R or Python package offers design-based variance estimation for modern heterogeneity-robust DiD estimators.
+
 ## Linear Algebra Helpers
 
 ```python
diff --git a/docs/llms-practitioner.txt b/docs/llms-practitioner.txt
index 08a682e8..53803eec 100644
--- a/docs/llms-practitioner.txt
+++ b/docs/llms-practitioner.txt
@@ -146,7 +146,8 @@ Is treatment adoption staggered (multiple cohorts, different timing)?
 |   |-- ImputationDiD (BJS)    -- most efficient under homogeneous effects
 |   |-- TwoStageDiD (Gardner)  -- two-stage with GMM variance
 |   |-- StackedDiD (Stacked)   -- sub-experiment approach
-|   \-- EfficientDiD (EDiD)    -- optimal weighting for tighter SEs
+|   |-- EfficientDiD (EDiD)    -- optimal weighting for tighter SEs
+|   \-- WooldridgeDiD (ETWFE)  -- nonlinear outcomes (logit/Poisson) or saturated OLS
 |
 |-- NO, simple 2x2 design:
 |   \-- DifferenceInDifferences (DiD)
@@ -160,6 +161,9 @@ Is treatment adoption staggered (multiple cohorts, different timing)?
 |-- Continuous treatment (doses)?
 |   \-- ContinuousDiD (CDiD)
 |
+|-- Nonlinear outcome (binary, count)?
+|   \-- WooldridgeDiD (ETWFE)  -- logit or Poisson QMLE with ASF-based ATT
+|
 \-- Two eligibility criteria?
     \-- TripleDifference (DDD)
 ```
diff --git a/docs/llms.txt b/docs/llms.txt
index fced72b7..44fda813 100644
--- a/docs/llms.txt
+++ b/docs/llms.txt
@@ -2,7 +2,7 @@
 
 > A Python library for Difference-in-Differences (DiD) causal inference analysis. Provides sklearn-like estimators with statsmodels-style summary output for econometric analysis.
 
-diff-diff offers 14 estimators covering basic 2x2 DiD, modern staggered adoption methods, advanced panel estimators, and diagnostic tools. It supports robust and cluster-robust standard errors, wild cluster bootstrap, formula and column-name interfaces, fixed effects (dummy and absorbed), and publication-ready output. The optional Rust backend accelerates compute-intensive estimators like Synthetic DiD and TROP.
+diff-diff offers 15 estimators covering basic 2x2 DiD, modern staggered adoption methods, advanced panel estimators, nonlinear models, and diagnostic tools. It supports robust and cluster-robust standard errors, wild cluster bootstrap, formula and column-name interfaces, fixed effects (dummy and absorbed), complex survey designs (strata/PSU/FPC, replicate weights, design-based variance), and publication-ready output. The optional Rust backend accelerates compute-intensive estimators like Synthetic DiD and TROP.
 
 - Install: `pip install diff-diff`
 - License: MIT
@@ -63,6 +63,8 @@ Full practitioner guide: docs/llms-practitioner.txt
 - [StackedDiD](https://diff-diff.readthedocs.io/en/stable/api/stacked_did.html): Wing, Freedman & Hollingsworth (2024) stacked DiD with Q-weights and sub-experiments
 - [EfficientDiD](https://diff-diff.readthedocs.io/en/stable/api/efficient_did.html): Chen, Sant'Anna & Xie (2025) efficient DiD with optimal weighting for tighter SEs
 - [TROP](https://diff-diff.readthedocs.io/en/stable/api/trop.html): Triply Robust Panel estimator (Athey et al. 2025) with nuclear norm factor adjustment
+- [StaggeredTripleDifference](https://diff-diff.readthedocs.io/en/stable/api/staggered_triple_diff.html): Ortiz-Villavicencio & Sant'Anna (2025) staggered DDD with group-time ATT
+- [WooldridgeDiD](https://diff-diff.readthedocs.io/en/stable/api/wooldridge_etwfe.html): Wooldridge (2023, 2025) ETWFE — OLS, logit, Poisson QMLE with ASF-based ATT. Alias: ETWFE
 - [BaconDecomposition](https://diff-diff.readthedocs.io/en/stable/api/bacon.html): Goodman-Bacon (2021) decomposition for diagnosing TWFE bias in staggered settings
 
 ## Diagnostics and Sensitivity Analysis
@@ -90,6 +92,20 @@ Full practitioner guide: docs/llms-practitioner.txt
 - [13 Stacked DiD](https://diff-diff.readthedocs.io/en/stable/tutorials/13_stacked_did.html): Stacked DiD — sub-experiments, Q-weights, event windows, trimming, clean control definitions
 - [14 Continuous DiD](https://diff-diff.readthedocs.io/en/stable/tutorials/14_continuous_did.html): Continuous treatment DiD — dose-response curves, ATT(d), ACRT, B-splines, event study diagnostics
 - [15 Efficient DiD](https://diff-diff.readthedocs.io/en/stable/tutorials/15_efficient_did.html): Chen, Sant'Anna & Xie (2025) efficient DiD — optimal weighting, PT-All vs PT-Post, efficiency gains
+- [16 Survey DiD](https://diff-diff.readthedocs.io/en/stable/tutorials/16_survey_did.html): Survey-weighted DiD — SurveyDesign, strata/PSU/FPC, replicate weights, subpopulation analysis, DEFF diagnostics
+- [16 Wooldridge ETWFE](https://diff-diff.readthedocs.io/en/stable/tutorials/16_wooldridge_etwfe.html): Wooldridge (2023, 2025) ETWFE — OLS/logit/Poisson, ASF-based ATT, aggregation types
+
+## Survey Support
+
+All estimators accept an optional `survey_design` parameter. Pass a `SurveyDesign` object to get design-based variance estimation:
+
+- **Design elements**: strata, PSU, FPC, weight types (pweight/fweight/aweight), lonely PSU handling, nest
+- **Variance methods**: Taylor Series Linearization (TSL), replicate weights (BRR/Fay/JK1/JKn/SDR), survey-aware bootstrap
+- **Diagnostics**: DEFF per coefficient, effective n, subpopulation analysis, weight trimming, CV on estimates
+- **Repeated cross-sections**: `CallawaySantAnna(panel=False)` for BRFSS, ACS, CPS
+- **Compatibility matrix**: [Survey Design Support](https://diff-diff.readthedocs.io/en/stable/choosing_estimator.html#survey-design-support)
+
+No R or Python package offers design-based variance estimation for modern heterogeneity-robust DiD estimators. R's `did`, `fixest`, `synthdid`, and `didimputation` accept flat weight vectors only.
 
 ## Optional
 
diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md
index ed0fc046..287597b1 100644
--- a/docs/survey-roadmap.md
+++ b/docs/survey-roadmap.md
@@ -1,385 +1,238 @@
 # Survey Data Support Roadmap
 
 This document captures the survey data support roadmap for diff-diff.
-Phases 1-8f are implemented. Phase 8g (documentation-only items) is partially
-addressed. Remaining deferred items are listed at the bottom.
-
-## Implemented (Phases 1-2)
-
-- **SurveyDesign** class with weights, strata, PSU, FPC, weight_type, nest, lonely_psu
-- **Weighted OLS** via sqrt(w) transformation in `solve_ols()`
-- **Weighted sandwich estimator** in `compute_robust_vcov()` (pweight/fweight/aweight)
-- **Taylor Series Linearization** (TSL) variance with strata + PSU + FPC
-- **Survey degrees of freedom**: n_PSU - n_strata
-- **Base estimator integration**: DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD
-- **Weighted demeaning**: `demean_by_group()` and `within_transform()` with weights
-- **SurveyMetadata** in results objects with effective n, DEFF, weight_range
-
-## Implemented (Phase 3): OLS-Based Standalone Estimators
-
-| Estimator | File | Survey Support | Notes |
-|-----------|------|----------------|-------|
-| StackedDiD | `stacked_did.py` | pweight only | Q-weights compose multiplicatively with survey weights; TSL vcov on composed weights; fweight/aweight rejected (composition changes weight semantics) |
-| SunAbraham | `sun_abraham.py` | Full | Survey weights in LinearRegression + weighted within-transform; bootstrap via Rao-Wu rescaled (Phase 6) |
-| BaconDecomposition | `bacon.py` | Diagnostic | Weighted cell means, weighted within-transform, weighted group shares; no inference (diagnostic only) |
-| TripleDifference | `triple_diff.py` | Full | Regression, IPW, and DR methods with weighted OLS/logit + TSL on influence functions |
-| ContinuousDiD | `continuous_did.py` | Analytical | Weighted B-spline OLS + TSL on influence functions; bootstrap via multiplier at PSU (Phase 6) |
-| EfficientDiD | `efficient_did.py` | Analytical | Weighted means/covariances in Omega* + TSL on EIF scores; bootstrap via multiplier at PSU (Phase 6) |
-
-### Phase 3 Deferred Work
-
-The following capabilities were deferred from Phase 3 because they depend on
-Phase 5 infrastructure (bootstrap+survey interaction):
-
-| Estimator | Deferred Capability | Blocker |
-|-----------|-------------------|---------|
-| SunAbraham | Pairs bootstrap + survey | **Resolved** (Phase 6, Rao-Wu rescaled) |
-| ContinuousDiD | Multiplier bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) |
-| EfficientDiD | Multiplier bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) |
-| EfficientDiD | Covariates (DR path) + survey | **Resolved**: survey weights threaded through DR nuisance estimation |
-
-All blocked combinations raise `NotImplementedError` when attempted, with a
-message pointing to the planned phase or describing the limitation.
-
-## Implemented (Phase 4): Complex Standalone Estimators + Weighted Logit
-
-| Estimator | File | Survey Support | Notes |
-|-----------|------|----------------|-------|
-| ImputationDiD | `imputation.py` | Analytical | Weighted iterative FE, weighted ATT aggregation, weighted conservative variance (Theorem 3); bootstrap via multiplier at PSU (Phase 6) |
-| TwoStageDiD | `two_stage.py` | Analytical | Weighted iterative FE, weighted Stage 2 OLS, weighted GMM sandwich variance; bootstrap via multiplier at PSU (Phase 6) |
-| CallawaySantAnna | `staggered.py` | Full | Full SurveyDesign (strata/PSU/FPC/replicate weights); reg supports covariates, IPW/DR supports covariates (Phase 7a); survey-weighted WIF in aggregation; replicate IF variance for analytical SEs |
-
-**Infrastructure**: Weighted `solve_logit()` added to `linalg.py` — survey weights
-enter the IRLS working weights as `w_survey * mu * (1 - mu)`. This also unblocked
-TripleDifference IPW/DR from Phase 3 deferred work.
-
-### Phase 4 Deferred Work
-
-| Estimator | Deferred Capability | Blocker |
-|-----------|-------------------|---------|
-| ImputationDiD | Bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) |
-| TwoStageDiD | Bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) |
-| CallawaySantAnna | Bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) |
-| CallawaySantAnna | Strata/PSU/FPC in SurveyDesign | **Resolved** (Phase 6, `compute_survey_if_variance()`) |
-| CallawaySantAnna | Covariates + IPW/DR + survey | **Resolved** (Phase 7a, DRDID nuisance IF corrections) |
-| CallawaySantAnna | Efficient DRDID nuisance IF for reg+covariates | Deferred — code uses conservative plug-in IF (see REGISTRY.md) |
-
-## Implemented (Phase 5): SyntheticDiD + TROP Survey Support
-
-| Estimator | File | Survey Support | Notes |
-|-----------|------|----------------|-------|
-| SyntheticDiD | `synthetic_did.py` | pweight | Both sides weighted (WLS interpretation): treated means survey-weighted, ω composed with control survey weights post-optimization. Placebo and bootstrap SE preserve survey weights. Covariates residualized with WLS. |
-| TROP | `trop.py` | pweight | ATT aggregation only: population-weighted average of per-observation treatment effects. Model fitting (kernel weights, LOOCV, nuclear norm) unchanged. Rust and Python bootstrap paths both support weighted ATT. |
-
-### Phase 5 Deferred Work
-
-| Estimator | Deferred Capability | Status |
-|-----------|-------------------|--------|
-| SyntheticDiD | strata/PSU/FPC + survey-aware bootstrap | Resolved in Phase 6 |
-| TROP | strata/PSU/FPC + survey-aware bootstrap | Resolved in Phase 6 |
-
-## Phase 6: Advanced Features
-
-### Bootstrap + Survey Interaction ✅ (2026-03-24)
-Survey-aware bootstrap for all 8 bootstrap-using estimators. Two strategies:
-- **Multiplier at PSU level** (CS, ImputationDiD, TwoStageDiD, ContinuousDiD,
-  EfficientDiD): generate multiplier weights at PSU level within strata, FPC-scaled.
-- **Rao-Wu rescaled** (SunAbraham, SyntheticDiD, TROP): draw PSUs within strata,
-  rescale observation weights per Rao, Wu & Yue (1992).
-- **CS analytical expansion**: strata/PSU/FPC now supported for aggregated SEs via
-  `compute_survey_if_variance()`.
-- **TROP**: cross-classified pseudo-strata (survey_stratum × treatment_group).
-- **Rust TROP**: pweight-only in Rust; full design falls to Python path.
-
-### Replicate Weight Variance ✅ (2026-03-26)
-Re-run WLS for each replicate weight column, compute variance from distribution
-of estimates. Supports BRR, Fay's BRR, JK1, JKn, and SDR methods.
-JKn requires explicit `replicate_strata` (per-replicate stratum assignment).
-- `replicate_weights`, `replicate_method`, `fay_rho` fields on SurveyDesign
-- `compute_replicate_vcov()` for OLS-based estimators (re-runs WLS per replicate)
-- `compute_replicate_if_variance()` for IF-based estimators (reweights IF)
-- Dispatch in `LinearRegression.fit()` and `staggered_aggregation.py`
-- Replicate weights mutually exclusive with strata/PSU/FPC
-- Survey df = rank(replicate_weights) - 1, matching R's `survey::degf()`
-- **Coverage**: 12 of 15 estimators support replicate weights.
-  Supported: DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD,
-  CallawaySantAnna, TripleDifference (analytical only), StaggeredTripleDifference,
-  SunAbraham, StackedDiD, ImputationDiD, TwoStageDiD, ContinuousDiD, EfficientDiD.
-  Rejected: SyntheticDiD, TROP (no published theory on replicate weights +
-  unit weight optimization / nuclear norm regularization), BaconDecomposition
-  (diagnostic tool, no inference).
-
-### DEFF Diagnostics ✅ (2026-03-26)
-Per-coefficient design effects comparing survey vcov to SRS (HC1) vcov.
-- `DEFFDiagnostics` dataclass with per-coefficient DEFF, effective n, SRS/survey SE
-- `compute_deff_diagnostics()` standalone function
-- `LinearRegression.compute_deff()` post-fit method
-- Display label "Kish DEFF (weights)" for existing weight-based DEFF
-
-### Subpopulation Analysis ✅ (2026-03-26)
-`SurveyDesign.subpopulation(data, mask)` — zero-out weights for excluded
-observations while preserving the full design structure for correct variance
-estimation (unlike simple subsetting, which would drop design information).
-- Mask: bool array/Series, column name, or callable
-- Returns (SurveyDesign, DataFrame) pair with synthetic `_subpop_weight` column
-- Weight validation relaxed: zero weights allowed (negative still rejected)
+Phases 1-9 are complete. Phase 10 covers the credibility and announcement
+readiness work still ahead.
 
 ---
 
-## Phase 7: Completing the Survey Story
-
-These items close the remaining gaps that matter for practitioners using major
-population surveys (ACS, CPS, BRFSS, MEPS) with modern DiD methods. Together
-they make diff-diff the only package — R or Python — with full design-based
-variance estimation for heterogeneity-robust DiD estimators.
-
-### 7a. CallawaySantAnna Covariates + IPW/DR + Survey ✅
-
-**Implemented.** DRDID panel nuisance-estimation IF corrections (PS + OR)
-for both survey and non-survey IPW/DR paths. Survey-weighted propensity
-score estimation via `solve_logit()`, survey-weighted outcome regression,
-and IFs that account for nuisance parameter estimation uncertainty under
-the survey design (Sant'Anna & Zhao 2020, Theorem 3.1).
-
-**Reference:** Sant'Anna, P.H.C. & Zhao, J. (2020). "Doubly Robust
-Difference-in-Differences Estimators." *Journal of Econometrics* 219(1).
-
-### 7b. Repeated Cross-Sections ✅
-
-**Priority: High.** Many major surveys (BRFSS, ACS annual cross-sections,
-CPS monthly) are not panels — units are not followed over time. The R `did`
-package supports `panel=FALSE` for these settings. diff-diff currently
-requires panel data for all staggered estimators.
-
-**Implemented.** `CallawaySantAnna(panel=False)` for repeated cross-section
-surveys. Uses cross-sectional DRDID (Sant'Anna & Zhao 2020, Section 4):
-`reg` matches `DRDID::reg_did_rc`, `dr` matches `DRDID::drdid_rc` (locally
-efficient with 4 OLS fits), `ipw` matches `DRDID::std_ipw_did_rc`. Survey
-weights, covariates, and all estimation methods supported.
-
-**Reference:** Sant'Anna, P.H.C. & Zhao, J. (2020). Sections 3 (panel) vs
-4 (repeated cross-sections). Callaway, B. & Sant'Anna, P.H.C. (2021).
-Section 4.1.
-
-### 7c. Survey-Aware DiD Tutorial ✅
-
-**Implemented.** `docs/tutorials/16_survey_did.ipynb` — 35-cell tutorial
-framed around a state-level preventive care program evaluated with a
-stratified health survey (ACS/BRFSS-like). Covers: why survey design
-matters, SurveyDesign setup, basic DiD with survey, staggered DiD
-(CallawaySantAnna) with survey, replicate weights (JK1), subpopulation
-analysis, DEFF diagnostics, repeated cross-sections, and a link to the
-compatibility matrix in `choosing_estimator.rst`. Uses `generate_survey_did_data()` DGP function
-added to `diff_diff.prep`.
-
-### 7d. HonestDiD with Survey Variance ✅
-
-**Implemented.** Survey df and full event-study VCV from IF vectors
-propagated to HonestDiD sensitivity analysis. When CallawaySantAnna is fit
-with a survey design, HonestDiD uses t-distribution critical values with
-survey degrees of freedom. Bootstrap/replicate designs fall back to
-diagonal VCV with a warning.
-
-**Reference:** Rambachan, A. & Roth, J. (2023). "A More Credible Approach
-to Parallel Trends." *Review of Economic Studies* 90(5).
-
-### 7e. StaggeredTripleDifference Survey Support ✅
-
-**Implemented.** Full survey design support (pweight only) for the
-Ortiz-Villavicencio & Sant'Anna (2025) staggered DDD estimator. Survey
-weights thread through all three pairwise DiD comparisons: propensity
-score estimation (weighted IRLS via `solve_logit`), outcome regression
-(WLS via `solve_ols`), and Riesz representer computation (weighted
-Hajek normalization). IF combination weights (w1/w2/w3) use
-survey-weighted cell sizes. Per-cell SEs use the standard IF formula
-(matching CallawaySantAnna); aggregated SEs (overall, event study,
-group) use design-based variance via `compute_survey_if_variance()`
-(TSL with strata/PSU/FPC) or `compute_replicate_if_variance()`
-(replicate weights). Bootstrap uses PSU-level multiplier weights
-when survey design is present.
-
-**Note:** The R `triplediff` package does not support survey weights.
-diff-diff is the only implementation (R or Python) with design-based
-variance estimation for staggered triple differences.
-
-**Reference:** Ortiz-Villavicencio, M. & Sant'Anna, P.H.C. (2025).
-"Better Understanding Triple Differences Estimators." arXiv:2505.09942.
+## What's Shipped
+
+### Phases 1-2: Core Infrastructure
+
+- `SurveyDesign` class with weights, strata, PSU, FPC, weight_type, nest, lonely_psu
+- Taylor Series Linearization (TSL) variance with strata + PSU + FPC
+- Weighted OLS, sandwich estimator, demeaning, survey degrees of freedom
+- `SurveyMetadata` on results (effective n, DEFF, weight_range)
+- Base estimators: DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD
+
+### Phase 3: OLS-Based Standalone Estimators
+
+| Estimator | Survey Support | Notes |
+|-----------|----------------|-------|
+| StackedDiD | pweight only | Q-weights compose multiplicatively; fweight/aweight rejected |
+| SunAbraham | Full | Bootstrap via Rao-Wu rescaled |
+| BaconDecomposition | Diagnostic | Weighted descriptives only, no inference |
+| TripleDifference | Full | Regression, IPW, and DR methods with TSL on IFs |
+| ContinuousDiD | Full | Weighted B-spline OLS + TSL; bootstrap via multiplier at PSU |
+| EfficientDiD | Full | No-cov and DR covariate paths both survey-weighted; bootstrap via multiplier at PSU |
+
+### Phase 4: Complex Estimators + Weighted Logit
+
+| Estimator | Survey Support | Notes |
+|-----------|----------------|-------|
+| ImputationDiD | Full | Weighted iterative FE + conservative variance; bootstrap via multiplier at PSU |
+| TwoStageDiD | Full | Weighted FE + GMM sandwich; bootstrap via multiplier at PSU |
+| CallawaySantAnna | Full | Strata/PSU/FPC/replicate weights; IPW/DR covariates (Phase 7a); replicate IF variance |
+
+Weighted `solve_logit()` in `linalg.py` — survey weights enter IRLS as
+`w_survey * mu * (1 - mu)`.
+
+### Phase 5: SyntheticDiD + TROP
+
+| Estimator | Survey Support | Notes |
+|-----------|----------------|-------|
+| SyntheticDiD | pweight | Treated means survey-weighted; omega composed with control weights post-optimization |
+| TROP | pweight | Population-weighted ATT aggregation; model fitting unchanged |
+
+### Phase 6: Advanced Features (v2.7.6)
+
+- **Survey-aware bootstrap** for all 8 bootstrap-using estimators:
+  multiplier at PSU (CS, Imputation, TwoStage, Continuous, Efficient)
+  and Rao-Wu rescaled (SA, SyntheticDiD, TROP)
+- **Replicate weight variance**: BRR, Fay's BRR, JK1, JKn, SDR.
+  12 of 15 estimators supported (not SyntheticDiD, TROP, or BaconDecomposition)
+- **DEFF diagnostics**: per-coefficient design effects vs SRS baseline
+- **Subpopulation analysis**: `SurveyDesign.subpopulation()` preserves
+  full design structure for correct variance
+
+### Phase 7: Completing the Survey Story (v2.8.0-v2.8.1)
+
+- **7a.** CS IPW/DR covariates + survey: DRDID nuisance IF corrections
+  (Sant'Anna & Zhao 2020, Theorem 3.1)
+- **7b.** Repeated cross-sections: `CallawaySantAnna(panel=False)` matching
+  `DRDID::reg_did_rc`, `drdid_rc`, `std_ipw_did_rc`
+- **7c.** Survey tutorial: `docs/tutorials/16_survey_did.ipynb` with full
+  workflow (strata, PSU, FPC, replicates, subpopulation, DEFF)
+- **7d.** HonestDiD + survey: survey df and event-study VCV propagated
+  to sensitivity analysis with t-distribution critical values
+- **7e.** StaggeredTripleDifference survey support (only implementation
+  in R or Python with design-based DDD variance)
+
+### Phase 8: Survey Maturity (v2.8.3-v2.8.4)
+
+- **8a.** SDR replicate method for ACS PUMS (80 columns)
+- **8b.** FPC in ImputationDiD and TwoStageDiD
+- **8c.** Silent operation warnings (8 operations now emit `UserWarning`)
+- **8d.** Lonely PSU "adjust" in bootstrap (Rust & Rao 1996)
+- **8e.** CV on estimates, `trim_weights()`, survey-aware ImputationDiD pretrends
+- **8f.** Compatibility matrix in `choosing_estimator.rst`
+
+### Phase 9: Real-Data Validation (v2.9.0)
+
+15 cross-validation tests against R's `survey` package using real federal
+survey datasets:
+
+| Dataset | Design | Key result |
+|---------|--------|------------|
+| API (R `survey`) | Strata + FPC | ATT, SE, df, CI match R (7 variants incl. subpopulation, Fay's BRR) |
+| NHANES (CDC/NCHS) | Strata + PSU (nest=TRUE) | ACA DiD matches R for strata+PSU, covariates, subpopulation |
+| RECS 2020 (U.S. EIA) | 60 JK1 replicate weights | Coefficients, SEs, df, CI match R |
+
+Files: `benchmarks/R/benchmark_realdata_*.R`, `tests/test_survey_real_data.py`,
+`benchmarks/data/real/*_realdata_golden.json`
+
+### Documentation Remaining (Phase 8g)
+
+- **Multi-stage design**: not yet documented. Single-stage (strata + PSU)
+  is sufficient per Lumley (2004) Section 2.2.
+- **Post-stratification / calibration**: not yet documented. `SurveyDesign`
+  expects pre-calibrated weights. `samplics` is the most complete Python
+  option (post-stratification, raking, GREG) but is in read-only mode —
+  active development has moved to `svy`, which is not yet publicly
+  released. `weightipy` is actively maintained for raking. Weight
+  calibration is out of scope for diff-diff today, though building this
+  capability is a future possibility.
 
 ---
 
-## Phase 8: Survey Maturity
-
-Refinements to close remaining gaps versus R's `survey` package and improve
-practitioner experience. Prioritized by user impact.
+## Phase 10: Academic Credibility and Announcement Readiness
 
-### 8a. Successive Difference Replication (SDR) ✅
+Before broadly announcing survey capability, these items establish the
+theoretical and empirical foundation needed for credibility with
+practitioners and methodologists.
 
-**Shipped in v2.8.4.** ACS PUMS — the most common US survey dataset for DiD
-policy evaluation — provides 80 SDR replicate weight columns.
-`SurveyDesign(replicate_method="SDR")` with variance formula
-`V = 4/R * sum((theta_r - theta)^2)`.
+### 10a. Theory Document (HIGH priority)
 
-**Reference:** Fay, R.E. & Train, G.F. (1995). "Aspects of Survey and
-Model-Based Postcensal Estimation of Income and Poverty Characteristics
-for States and Counties." ASA Proceedings.
+Write `docs/methodology/survey-theory.md` laying out the formal argument
+for design-based variance estimation with modern DiD influence functions:
 
-### 8b. FPC in ImputationDiD and TwoStageDiD ✅
+1. Modern heterogeneity-robust DiD estimators (CS, SA, BJS) are smooth
+   functionals of the weighted empirical distribution
+2. Survey-weighted empirical distribution is design-consistent for the
+   superpopulation quantity (Horvitz-Thompson)
+3. The influence function is a property of the functional, not the
+   sampling design — IFs remain valid under survey weighting
+4. TSL (stratified cluster sandwich) and replicate-weight methods are
+   valid variance estimators for smooth functionals of survey-weighted
+   estimating equations (Binder 1983, Rao & Wu 1988, Shao 1996)
 
-**Shipped in v2.8.4.** Both estimators now have full strata/PSU/FPC
-support. FPC is threaded through the existing TSL variance path.
+This is the short-term deliverable that can be linked from docs and README
+immediately.
 
-### 8c. Silent Operation Warnings ✅
+**Key references:**
+- Binder, D.A. (1983). "On the Variances of Asymptotically Normal
+  Estimators from Complex Surveys." *International Statistical Review* 51.
+- Rao, J.N.K. & Wu, C.F.J. (1988). "Resampling Inference with Complex
+  Survey Data." *JASA* 83(401).
+- Shao, J. (1996). "Resampling Methods in Sample Surveys." *Statistics* 27.
 
-**Shipped in v2.8.3.** Eight operations that previously altered analysis
-results without informing the user now emit `UserWarning`:
-TROP lstsq fallback, TwoStageDiD NaN masking, TwoStageDiD always-treated
-removal, CallawaySantAnna (g,t) pair skipping, TROP treatment indicator
-fill, Rust → Python fallback, survey weight normalization, `np.inf` → 0
-never-treated conversion.
+### 10b. Survey Simulation DGP (HIGH priority)
 
-### 8d. Lonely PSU "adjust" in Bootstrap ✅
+Build a research-grade DGP that generates realistic complex survey data
+over a staggered treatment adoption panel. The existing `generate_survey_did_data()`
+tests code correctness but lacks the properties needed for statistical
+coverage studies and compelling tutorials. The new DGP needs:
 
-**Shipped in v2.8.4.** `lonely_psu="adjust"` now works with survey-aware
-bootstrap using Rust & Rao (1996) grand-mean centering.
+- Known stratified cluster structure with varying PSU sizes
+- Controllable intra-cluster correlation (so true DEFF is known)
+- Known treatment effects (so coverage of 95% CIs can be measured)
+- Enough design complexity to show where flat weights fail (clustering
+  inflates variance, stratification reduces it, FPC matters for small
+  populations)
 
-**Reference:** Rust, K.F. & Rao, J.N.K. (1996). "Variance Estimation
-for Complex Surveys Using Replication Techniques." Statistical Methods
-in Medical Research 5(3).
+This is a dependency for both 10c (tutorial) and 10d (paper simulation
+study). Add to `diff_diff.prep` alongside the existing DGP functions.
 
-### 8e. Survey Diagnostics and Utilities ✅
+### 10c. Expand R Validation Coverage (HIGH priority)
 
-**Shipped in v2.8.4.**
-- **CV on estimates**: `coef_var` property on all results objects (SE/|estimate|).
-  Handles edge cases (SE=0, estimate=0).
-- **Weight trimming**: `trim_weights(data, weight_col, upper=None, lower=None,
-  quantile=None)` in `prep.py` for capping extreme survey weights.
-- **ImputationDiD pretrends + survey**: pre-trends F-test now survey-aware
-  using subpopulation approach for correct variance under complex designs.
+Current R-validated estimators: DifferenceInDifferences, TWFE,
+CallawaySantAnna, SyntheticDiD (4 of 15). We can validate the OLS
+regression path against R's `survey::svyglm()` for estimators that
+reduce to WLS:
 
-### 8f. Survey Compatibility Matrix ✅
+| Estimator | Validation approach | Status |
+|-----------|-------------------|--------|
+| ImputationDiD | Compare WLS step against `svyglm()` | Not started |
+| StackedDiD | Compare stacked WLS against `svyglm()` | Not started |
+| SunAbraham | Compare interaction-weighted WLS against `svyglm()` | Not started |
+| TripleDifference | Compare DDD regression against `svyglm()` | Not started |
+| EfficientDiD | No R reference exists | Deferred |
+| TROP | No R reference exists | Deferred |
 
-**Shipped.** Full compatibility table added to `docs/choosing_estimator.rst`
-(Survey Design Support section) showing estimator × survey feature
-combinations. Tutorial cross-references this table.
+### 10d. Tutorial: Show the Pain (HIGH priority)
 
-### 8g. Documentation-Only Items
+Expand the survey tutorial with a side-by-side comparison using the DGP
+from 10b:
 
-**Partially addressed.** No code changes required. Remaining items
-deferred to the consolidated list below:
-- **Multi-stage design**: document that single-stage (strata + PSU)
-  is sufficient for variance estimation per Lumley (2004) Section 2.2.
-- **Post-stratification / calibration**: document that `SurveyDesign`
-  expects pre-calibrated weights. Point users to `samplics` or R's
-  `survey::calibrate()` for weight calibration.
+- ATT with flat weights (what R's `did` package gives you)
+- ATT with full survey design (what diff-diff gives you)
+- DEFF showing how much SEs were underestimated
+- An example where inference conclusions change
 
----
+Because the DGP has known parameters, the tutorial can show not just that
+the results differ, but which one is *right*. This is the content that
+practitioners share and that converts skeptics.
 
-## Phase 9: Real-Data Validation
+### 10e. Position Paper / arXiv Preprint (MEDIUM priority, long-term)
 
-Complements the synthetic-data cross-validation (Phase 6 BRR, Tier 1-3 in
-`benchmark_survey_crossvalidation.R`) with real federal survey datasets.
-Validates that diff-diff's survey variance matches R's `survey` package
-when fed actual survey design variables from published data sources.
+A 15-25 page methodology note targeting JSSAM, simultaneously posted to
+arXiv. Theory (~5pp), simulation study using DGP from 10b (~8pp),
+empirical illustration with NHANES ACA data (~3pp), software section
+(~2pp).
 
-### Datasets
+**Ideal co-author:** Pedro Sant'Anna — derived the IFs in CS/DRDID and
+can vouch they are valid under survey weighting. The survey statistics
+(Binder 1983, Rao & Wu 1988) are established and don't need a survey
+methodologist to co-sign.
 
-| Dataset | Source | Design | Validates |
-|---------|--------|--------|-----------|
-| API (apistrat) | R `survey` package | Strata + FPC + weights | TSL variance, subpopulations, Fay's BRR |
-| NHANES 2007-08 / 2015-16 | CDC/NCHS | Strata + PSU + weights (nest=TRUE) | TSL with real clustering, ACA DiD |
-| RECS 2020 | U.S. EIA | 60 JK1 replicate weights | JK1 replicate weight variance |
+### 10f. WooldridgeDiD Survey Support (MEDIUM priority)
 
-### Results
+WooldridgeDiD (ETWFE) is the only estimator without `survey_design`
+support. The OLS path is straightforward (same as TWFE); the logit/Poisson
+paths require survey-weighted IRLS. Should be added to the compatibility
+matrix regardless of whether support is implemented before announcement.
 
-- **Suite A (API):** 8 tests — ATT, SE, df, CI match R across 7 design
-  variants including subpopulation and Fay's BRR replicates. Known
-  differences: TWFE (A4) validates ATT only (SE differs due to unit FE);
-  subpopulation (A5) validates ATT/SE but df differs (strata preservation
-  vs R's `subset()` — documented deviation in REGISTRY.md).
-- **Suite B (NHANES):** 4 tests — ATT, SE, df, CI match R for
-  strata+PSU+weights, covariates, weights-only, and female subpopulation.
-- **Suite C (RECS):** 3 tests — JK1 regression coefficients, SEs, df,
-  and CI match R with 60 real replicate weight columns. DEFF validated
-  as finite/positive (different naive baselines prevent exact comparison).
+### 10g. Practitioner Guidance (LOW priority)
 
-### Files
-
-- R scripts: `benchmarks/R/benchmark_realdata_{api,nhanes,recs}.R`
-- Download scripts: `benchmarks/scripts/download_{nhanes,recs}.py`
-- Python tests: `tests/test_survey_real_data.py`
-- Golden values: `benchmarks/data/real/*_realdata_golden.json`
+A decision flowchart helping practitioners decide whether they need full
+survey design or whether flat weights suffice. Key factors: ICC, number
+of PSUs, stratification gain, DEFF magnitude. DEFF diagnostics provide
+the empirical answer, but practitioners need guidance on interpretation.
 
 ---
 
-## Deferred Work (Consolidated)
-
-### Documented Deviations
-
-These are supported paths that use a conservative or simplified approach
-rather than the theoretically optimal one. They do not raise errors.
-
-| Estimator | Deviation | Details |
-|-----------|-----------|---------|
-| CallawaySantAnna | `reg`+covariates uses conservative plug-in IF | Efficient DRDID nuisance IF correction deferred; see REGISTRY.md |
-
-### Runtime Limitations
-
-All items below raise an error when attempted (`NotImplementedError` or
-`ValueError` depending on the estimator), with a message describing the
-limitation. See also `TODO.md` for general tech debt items (e.g.,
-multi-absorb + survey weights).
-
-### Replicate Weights Not Supported
-
-| Estimator | Reason |
-|-----------|--------|
-| SyntheticDiD | No published theory on replicate weights + unit weight optimization |
-| TROP | No published theory on replicate weights + nuclear norm regularization |
-| BaconDecomposition | Diagnostic tool with no inference — replicate weights don't apply |
-
-### EfficientDiD Survey Limitations
+## Current Limitations
 
-| Limitation | Reason |
-|-----------|--------|
-| `cluster` + `survey_design` | Use `survey_design` with PSU/strata instead |
+All items below raise an error when attempted, with a message describing
+the limitation and suggested alternative.
 
-### Bootstrap + Replicate Weights (Mutual Exclusion)
-
-Replicate weights and bootstrap are alternative variance estimation methods.
-Combining them raises `NotImplementedError` or `ValueError`:
-
-| Estimator |
-|-----------|
-| CallawaySantAnna |
-| ContinuousDiD |
-| EfficientDiD |
-| ImputationDiD |
-| StaggeredTripleDifference |
-| SunAbraham |
-| TwoStageDiD |
-
-### Other Limitations
-
-| Estimator | Limitation | Reason |
-|-----------|-----------|--------|
+| Estimator | Limitation | Alternative |
+|-----------|-----------|-------------|
+| WooldridgeDiD | No `survey_design` support | Not yet implemented (see 10f) |
+| SyntheticDiD | Replicate weights | Use TSL (strata/PSU/FPC) with bootstrap |
+| TROP | Replicate weights | Use TSL (strata/PSU/FPC) with bootstrap |
+| BaconDecomposition | Replicate weights | Diagnostic only, no inference |
 | SyntheticDiD | `variance_method='placebo'` + strata/PSU/FPC | Use `variance_method='bootstrap'` |
-| ImputationDiD | `pretrends=True` + replicate weights | Per-replicate lead regression refits not implemented |
-| ImputationDiD | `pretrend_test()` + replicate weights | Per-replicate Equation 9 refits not implemented |
-| DifferenceInDifferences, TwoWayFixedEffects | `inference='wild_bootstrap'` + `survey_design` | Raises `NotImplementedError`; use analytical survey inference (the default) instead |
-
-### Warning/Fallback Behaviors
-
-These do not raise errors but silently change behavior:
-
-| Estimator | Limitation | Behavior |
-|-----------|-----------|----------|
-| MultiPeriodDiD | `inference='wild_bootstrap'` + `survey_design` | Warns and falls back to analytical inference |
-
----
-
-## Remaining Documentation Tasks (Phase 8g)
-
-These are documentation improvements, not runtime limitations:
-
-- **Multi-stage design**: Document that single-stage (strata + PSU) is sufficient for variance estimation per Lumley (2004) Section 2.2. No code changes needed.
-- **Post-stratification / calibration**: Document that `SurveyDesign` expects pre-calibrated weights. Point users to `samplics` or R's `survey::calibrate()` for weight calibration.
+| ImputationDiD | `pretrends=True` + replicate weights | Use analytical survey design instead |
+| ImputationDiD | `pretrend_test()` + replicate weights | Use analytical survey design instead |
+| DiD, TWFE | `inference='wild_bootstrap'` + `survey_design` | Use analytical survey inference (default) |
+| EfficientDiD | `cluster` + `survey_design` | Use `survey_design` with PSU/strata |
+| All bootstrap estimators | Bootstrap + replicate weights | These are alternative variance methods; pick one |
+
+**Warning/fallback (no error):** MultiPeriodDiD with `wild_bootstrap` +
+`survey_design` warns and falls back to analytical inference.
+
+**Conservative approach (no error):** CallawaySantAnna `reg`+covariates
+uses conservative plug-in IF rather than efficient DRDID nuisance IF
+correction (see REGISTRY.md).

From 4cfd68f0512a8b880a99bbbccdcd1399cccd75e2 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 5 Apr 2026 08:28:18 -0400
Subject: [PATCH 2/6] Address AI review: qualify survey claims, fix API drift
 and examples

- P1: Replace blanket "all estimators" survey claims with "14 of 15"
  or explicit WooldridgeDiD exception in ROADMAP.md, llms.txt, llms-full.txt
- P2: Fix SyntheticDiD/TROP alternative wording: "Rao-Wu rescaled bootstrap"
  not "TSL" in survey-roadmap.md
- P2: Sync StaggeredTripleDifference snippet in llms-full.txt to live
  signature (add epv_threshold, pscore_fallback, balance_e; fix aggregate default)
- P2: Fix survey examples: .design_effect not .deff, subpopulation() returns
  (SurveyDesign, DataFrame) tuple

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 ROADMAP.md             |  2 +-
 docs/llms-full.txt     | 13 ++++++++-----
 docs/llms.txt          |  2 +-
 docs/survey-roadmap.md |  4 ++--
 4 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/ROADMAP.md b/ROADMAP.md
index 68824d8d..1378e191 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -28,7 +28,7 @@ diff-diff is a **production-ready** DiD library with feature parity with R's `di
 
 ### Survey Support
 
-`SurveyDesign` with strata, PSU, FPC, weight types (pweight/fweight/aweight), lonely PSU handling. All 15 estimators accept survey weights; design-based variance estimation varies by estimator:
+`SurveyDesign` with strata, PSU, FPC, weight types (pweight/fweight/aweight), lonely PSU handling. 14 of 15 estimators accept `survey_design` (WooldridgeDiD support planned for Phase 10f); design-based variance estimation varies by estimator:
 
 - **TSL variance** (Taylor Series Linearization) with strata + PSU + FPC
 - **Replicate weights**: BRR, Fay's BRR, JK1, JKn, SDR — 12 of 15 estimators
diff --git a/docs/llms-full.txt b/docs/llms-full.txt
index 6f6e1a6b..77b26404 100644
--- a/docs/llms-full.txt
+++ b/docs/llms-full.txt
@@ -678,6 +678,8 @@ StaggeredTripleDifference(
     pscore_trim: float = 0.01,
     cluster: str | None = None,
     rank_deficient_action: str = "warn",
+    epv_threshold: float = 10,              # Min events-per-variable for propensity score
+    pscore_fallback: str = "error",         # "error" or "unconditional"
 )
 ```
 
@@ -694,7 +696,8 @@ sddd.fit(
     first_treat: str,
     eligibility: str,               # Binary eligibility column (0/1)
     covariates: list[str] | None = None,
-    aggregate: str = "event_study",  # "simple", "group", "event_study"
+    aggregate: str | None = None,   # None, "simple", "group", "event_study", "all"
+    balance_e: int | None = None,   # Max event time for balanced event study
     survey_design=None,
 ) -> StaggeredTripleDiffResults
 ```
@@ -1522,7 +1525,7 @@ clear_cache()
 
 ## Survey Support
 
-All estimators accept an optional `survey_design` parameter in `fit()`. Pass a `SurveyDesign` object to get design-based variance estimation.
+All estimators except `WooldridgeDiD` accept an optional `survey_design` parameter in `fit()`. Pass a `SurveyDesign` object to get design-based variance estimation. (WooldridgeDiD survey support is planned for Phase 10f.)
 
 ```python
 from diff_diff import SurveyDesign, CallawaySantAnna
@@ -1544,7 +1547,7 @@ results = cs.fit(data, outcome='y', unit='id', time='t',
                  first_treat='ft', survey_design=sd)
 
 # Survey metadata on results
-print(results.survey_metadata.deff)         # Design effect
+print(results.survey_metadata.design_effect)  # Design effect
 print(results.survey_metadata.effective_n)  # Effective sample size
 print(results.survey_metadata.df_survey)    # Survey degrees of freedom
 ```
@@ -1562,7 +1565,7 @@ sd = SurveyDesign(
 **Subpopulation analysis:**
 
 ```python
-sd_female = sd.subpopulation(data, mask=lambda df: df['sex'] == 'F')
+sd_female, data_female = sd.subpopulation(data, mask=lambda df: df['sex'] == 'F')
 ```
 
 **Key features:**
@@ -1573,7 +1576,7 @@ sd_female = sd.subpopulation(data, mask=lambda df: df['sex'] == 'F')
 - Repeated cross-sections: `CallawaySantAnna(panel=False)`
 - Compatibility matrix: see `docs/choosing_estimator.rst` Survey Design Support section
 
-No R or Python package offers design-based variance estimation for modern heterogeneity-robust DiD estimators.
+No R or Python package offers design-based variance estimation for modern heterogeneity-robust DiD estimators. (`WooldridgeDiD` does not yet accept `survey_design` — planned for Phase 10f.)
 
 ## Linear Algebra Helpers
 
diff --git a/docs/llms.txt b/docs/llms.txt
index 44fda813..e9210a8c 100644
--- a/docs/llms.txt
+++ b/docs/llms.txt
@@ -105,7 +105,7 @@ All estimators accept an optional `survey_design` parameter. Pass a `SurveyDesig
 - **Repeated cross-sections**: `CallawaySantAnna(panel=False)` for BRFSS, ACS, CPS
 - **Compatibility matrix**: [Survey Design Support](https://diff-diff.readthedocs.io/en/stable/choosing_estimator.html#survey-design-support)
 
-No R or Python package offers design-based variance estimation for modern heterogeneity-robust DiD estimators. R's `did`, `fixest`, `synthdid`, and `didimputation` accept flat weight vectors only.
+No R or Python package offers design-based variance estimation for modern heterogeneity-robust DiD estimators. R's `did`, `fixest`, `synthdid`, and `didimputation` accept flat weight vectors only. (Note: `WooldridgeDiD` does not yet accept `survey_design` — planned for Phase 10f.)
 
 ## Optional
 
diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md
index 287597b1..32465c42 100644
--- a/docs/survey-roadmap.md
+++ b/docs/survey-roadmap.md
@@ -220,8 +220,8 @@ the limitation and suggested alternative.
 | Estimator | Limitation | Alternative |
 |-----------|-----------|-------------|
 | WooldridgeDiD | No `survey_design` support | Not yet implemented (see 10f) |
-| SyntheticDiD | Replicate weights | Use TSL (strata/PSU/FPC) with bootstrap |
-| TROP | Replicate weights | Use TSL (strata/PSU/FPC) with bootstrap |
+| SyntheticDiD | Replicate weights | Use strata/PSU/FPC design with Rao-Wu rescaled bootstrap |
+| TROP | Replicate weights | Use strata/PSU/FPC design with Rao-Wu rescaled bootstrap |
 | BaconDecomposition | Replicate weights | Diagnostic only, no inference |
 | SyntheticDiD | `variance_method='placebo'` + strata/PSU/FPC | Use `variance_method='bootstrap'` |
 | ImputationDiD | `pretrends=True` + replicate weights | Use analytical survey design instead |

From 078f48e7661daabd102d3b429ce35045baf23cc1 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 5 Apr 2026 08:39:31 -0400
Subject: [PATCH 3/6] Address AI review round 2: fix remaining survey claim
 inconsistencies

- P1: llms.txt survey section now says "except WooldridgeDiD" at the top
- P1: survey-roadmap.md replicate weight count updated to 12 of 16
  (adds WooldridgeDiD to unsupported list)
- P3: llms.txt estimator count corrected from 15 to 16

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 docs/llms.txt          | 4 ++--
 docs/survey-roadmap.md | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/llms.txt b/docs/llms.txt
index e9210a8c..e29ffeb9 100644
--- a/docs/llms.txt
+++ b/docs/llms.txt
@@ -2,7 +2,7 @@
 
 > A Python library for Difference-in-Differences (DiD) causal inference analysis. Provides sklearn-like estimators with statsmodels-style summary output for econometric analysis.
 
-diff-diff offers 15 estimators covering basic 2x2 DiD, modern staggered adoption methods, advanced panel estimators, nonlinear models, and diagnostic tools. It supports robust and cluster-robust standard errors, wild cluster bootstrap, formula and column-name interfaces, fixed effects (dummy and absorbed), complex survey designs (strata/PSU/FPC, replicate weights, design-based variance), and publication-ready output. The optional Rust backend accelerates compute-intensive estimators like Synthetic DiD and TROP.
+diff-diff offers 16 estimators covering basic 2x2 DiD, modern staggered adoption methods, advanced panel estimators, nonlinear models, and diagnostic tools. It supports robust and cluster-robust standard errors, wild cluster bootstrap, formula and column-name interfaces, fixed effects (dummy and absorbed), complex survey designs (strata/PSU/FPC, replicate weights, design-based variance), and publication-ready output. The optional Rust backend accelerates compute-intensive estimators like Synthetic DiD and TROP.
 
 - Install: `pip install diff-diff`
 - License: MIT
@@ -97,7 +97,7 @@ Full practitioner guide: docs/llms-practitioner.txt
 
 ## Survey Support
 
-All estimators accept an optional `survey_design` parameter. Pass a `SurveyDesign` object to get design-based variance estimation:
+All estimators except `WooldridgeDiD` accept an optional `survey_design` parameter. Pass a `SurveyDesign` object to get design-based variance estimation:
 
 - **Design elements**: strata, PSU, FPC, weight types (pweight/fweight/aweight), lonely PSU handling, nest
 - **Variance methods**: Taylor Series Linearization (TSL), replicate weights (BRR/Fay/JK1/JKn/SDR), survey-aware bootstrap
diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md
index 32465c42..6e56db61 100644
--- a/docs/survey-roadmap.md
+++ b/docs/survey-roadmap.md
@@ -51,7 +51,7 @@ Weighted `solve_logit()` in `linalg.py` — survey weights enter IRLS as
   multiplier at PSU (CS, Imputation, TwoStage, Continuous, Efficient)
   and Rao-Wu rescaled (SA, SyntheticDiD, TROP)
 - **Replicate weight variance**: BRR, Fay's BRR, JK1, JKn, SDR.
-  12 of 15 estimators supported (not SyntheticDiD, TROP, or BaconDecomposition)
+  12 of 16 estimators supported (not SyntheticDiD, TROP, BaconDecomposition, or WooldridgeDiD)
 - **DEFF diagnostics**: per-coefficient design effects vs SRS baseline
 - **Subpopulation analysis**: `SurveyDesign.subpopulation()` preserves
   full design structure for correct variance

From b6b7e694254ed4b8849548d0f2e7fbff12c0b909 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 5 Apr 2026 08:54:06 -0400
Subject: [PATCH 4/6] Address AI review round 3: choosing_estimator survey
 exclusion, DDD split

- P1: choosing_estimator.rst survey intro now excludes WooldridgeDiD,
  added WooldridgeDiD row to compatibility matrix with "--" across all columns
- P1: Split DDD selection guidance into TripleDifference (2x2x2) vs
  StaggeredTripleDifference (staggered timing) in llms-practitioner.txt,
  llms-full.txt, and choosing_estimator.rst comparison table
- Added WooldridgeDiD to llms-full.txt estimator selection table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 docs/choosing_estimator.rst | 17 ++++++++++++++---
 docs/llms-full.txt          |  4 +++-
 docs/llms-practitioner.txt  |  5 +++--
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst
index a74269cc..24f32e85 100644
--- a/docs/choosing_estimator.rst
+++ b/docs/choosing_estimator.rst
@@ -593,9 +593,10 @@ If you're unsure which estimator to use:
 Survey Design Support
 ---------------------
 
-All estimators accept an optional ``survey_design`` parameter in ``fit()``.
-Pass a :class:`~diff_diff.SurveyDesign` object to get design-based variance
-estimation. The depth of support varies by estimator:
+All estimators except :class:`~diff_diff.WooldridgeDiD` accept an optional
+``survey_design`` parameter in ``fit()``. Pass a :class:`~diff_diff.SurveyDesign`
+object to get design-based variance estimation. The depth of support varies by
+estimator (WooldridgeDiD survey support is planned for Phase 10f):
 
 .. list-table::
    :header-rows: 1
@@ -676,12 +677,22 @@ estimation. The depth of support varies by estimator:
      - Via bootstrap
      - --
      - Rao-Wu rescaled
+   * - ``WooldridgeDiD``
+     - --
+     - --
+     - --
+     - --
    * - ``BaconDecomposition``
      - Diagnostic
      - Diagnostic
      - --
      - --
 
+.. note::
+
+   ``WooldridgeDiD`` does not yet accept ``survey_design``. Survey support
+   is planned for Phase 10f. See :doc:`/survey-roadmap` for details.
+
 **Legend:**
 
 - **Full**: All weight types (pweight/fweight/aweight) + strata/PSU/FPC + Taylor Series Linearization variance
diff --git a/docs/llms-full.txt b/docs/llms-full.txt
index 77b26404..a947268f 100644
--- a/docs/llms-full.txt
+++ b/docs/llms-full.txt
@@ -1632,7 +1632,9 @@ DIFF_DIFF_BACKEND=rust pytest     # Force Rust (fail if unavailable)
 | Few treated units / synthetic control | `SyntheticDiD` |
 | Interactive fixed effects / factor confounding | `TROP` |
 | Continuous treatment intensity | `ContinuousDiD` |
-| Two-criterion treatment (group + eligibility) | `TripleDifference` |
+| Two-criterion treatment, simultaneous (2x2x2 DDD) | `TripleDifference` |
+| Two-criterion treatment, staggered timing + eligibility | `StaggeredTripleDifference` |
+| Nonlinear outcome (binary/count) with staggered timing | `WooldridgeDiD` |
 | Diagnosing TWFE bias | `BaconDecomposition` |
 | Efficiency-optimal estimation | `EfficientDiD` |
 | Corrective weighting for stacked regressions | `StackedDiD` |
diff --git a/docs/llms-practitioner.txt b/docs/llms-practitioner.txt
index 53803eec..d92de93e 100644
--- a/docs/llms-practitioner.txt
+++ b/docs/llms-practitioner.txt
@@ -164,8 +164,9 @@ Is treatment adoption staggered (multiple cohorts, different timing)?
 |-- Nonlinear outcome (binary, count)?
 |   \-- WooldridgeDiD (ETWFE)  -- logit or Poisson QMLE with ASF-based ATT
 |
-\-- Two eligibility criteria?
-    \-- TripleDifference (DDD)
+\-- Two eligibility criteria (DDD)?
+    |-- Simultaneous treatment (2x2x2):  TripleDifference (DDD)
+    \-- Staggered timing + eligibility:  StaggeredTripleDifference (SDDD)
 ```
 
 Always run BaconDecomposition first if using TWFE, to check for negative

From 67714b26723227d984ebcdeacfe7711a803509fc Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 5 Apr 2026 09:01:36 -0400
Subject: [PATCH 5/6] Address AI review round 4: add DDD flowchart branch, fix
 denominators

- P1: Add DDD question as step 0 in choosing_estimator.rst flowchart,
  routing to TripleDifference (2x2x2) or StaggeredTripleDifference
  (staggered). Renumber subsequent steps.
- P3: Standardize estimator counts to 16 total across ROADMAP.md,
  llms-full.txt, and survey-roadmap.md. Replicate support is 12 of 16,
  survey_design support is 15 of 16.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 ROADMAP.md                  |  4 ++--
 docs/choosing_estimator.rst | 20 +++++++++++++-------
 docs/llms-full.txt          |  2 +-
 3 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/ROADMAP.md b/ROADMAP.md
index 1378e191..2ed3385f 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -28,10 +28,10 @@ diff-diff is a **production-ready** DiD library with feature parity with R's `di
 
 ### Survey Support
 
-`SurveyDesign` with strata, PSU, FPC, weight types (pweight/fweight/aweight), lonely PSU handling. 14 of 15 estimators accept `survey_design` (WooldridgeDiD support planned for Phase 10f); design-based variance estimation varies by estimator:
+`SurveyDesign` with strata, PSU, FPC, weight types (pweight/fweight/aweight), lonely PSU handling. 15 of 16 estimators accept `survey_design` (WooldridgeDiD support planned for Phase 10f); design-based variance estimation varies by estimator:
 
 - **TSL variance** (Taylor Series Linearization) with strata + PSU + FPC
-- **Replicate weights**: BRR, Fay's BRR, JK1, JKn, SDR — 12 of 15 estimators
+- **Replicate weights**: BRR, Fay's BRR, JK1, JKn, SDR — 12 of 16 estimators (not SyntheticDiD, TROP, BaconDecomposition, or WooldridgeDiD)
 - **Survey-aware bootstrap**: multiplier at PSU (IF-based) and Rao-Wu rescaled (resampling-based)
 - **DEFF diagnostics**, **subpopulation analysis**, **weight trimming**, **CV on estimates**
 - **Repeated cross-sections**: `CallawaySantAnna(panel=False)` for BRFSS, ACS, CPS
diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst
index 24f32e85..e76efa40 100644
--- a/docs/choosing_estimator.rst
+++ b/docs/choosing_estimator.rst
@@ -12,30 +12,36 @@ Decision Flowchart
 
 Start here and follow the questions:
 
-0. **Is treatment continuous?** (Units receive different doses or intensities)
+0. **Is this a triple-difference (DDD) design?** (Two criteria for treatment: e.g., policy adoption AND group eligibility)
 
    - **No** → Go to question 1
-   - **Yes** → Use :class:`~diff_diff.ContinuousDiD`
+   - **Yes, simultaneous treatment (2×2×2)** → Use :class:`~diff_diff.TripleDifference`
+   - **Yes, with staggered timing** → Use :class:`~diff_diff.StaggeredTripleDifference`
 
-1. **Is treatment staggered?** (Different units treated at different times)
+1. **Is treatment continuous?** (Units receive different doses or intensities)
 
    - **No** → Go to question 2
+   - **Yes** → Use :class:`~diff_diff.ContinuousDiD`
+
+2. **Is treatment staggered?** (Different units treated at different times)
+
+   - **No** → Go to question 3
    - **Yes** → Use :class:`~diff_diff.CallawaySantAnna` (or :class:`~diff_diff.EfficientDiD` for tighter SEs under PT-All)
    - **Yes, and you suspect homogeneous effects** → Use :class:`~diff_diff.ImputationDiD` or :class:`~diff_diff.TwoStageDiD` for tighter CIs
    - **Yes, with nonlinear outcome (binary/count)** → Use :class:`~diff_diff.WooldridgeDiD` with ``method='logit'`` or ``method='poisson'``
    - **Want to diagnose TWFE bias?** → Use :class:`~diff_diff.BaconDecomposition` first
 
-2. **Do you have panel data?** (Multiple observations per unit over time)
+3. **Do you have panel data?** (Multiple observations per unit over time)
 
    - **No** → Use :class:`~diff_diff.DifferenceInDifferences` (basic 2x2)
-   - **Yes** → Go to question 3
+   - **Yes** → Go to question 4
 
-3. **Do you need period-specific effects?** (Event study design)
+4. **Do you need period-specific effects?** (Event study design)
 
    - **No** → Use :class:`~diff_diff.TwoWayFixedEffects`
    - **Yes** → Use :class:`~diff_diff.MultiPeriodDiD`
 
-4. **Is your treated group small?** (Few treated units, many controls)
+5. **Is your treated group small?** (Few treated units, many controls)
 
    - Consider :class:`~diff_diff.SyntheticDiD` for better pre-treatment fit
 
diff --git a/docs/llms-full.txt b/docs/llms-full.txt
index a947268f..ebb4de4a 100644
--- a/docs/llms-full.txt
+++ b/docs/llms-full.txt
@@ -1570,7 +1570,7 @@ sd_female, data_female = sd.subpopulation(data, mask=lambda df: df['sex'] == 'F'
 
 **Key features:**
 - Taylor Series Linearization (TSL) variance with strata + PSU + FPC
-- Replicate weight variance: BRR, Fay's BRR, JK1, JKn, SDR (12 of 15 estimators)
+- Replicate weight variance: BRR, Fay's BRR, JK1, JKn, SDR (12 of 16 estimators)
 - Survey-aware bootstrap: multiplier at PSU or Rao-Wu rescaled
 - DEFF diagnostics, subpopulation analysis, weight trimming (`trim_weights`)
 - Repeated cross-sections: `CallawaySantAnna(panel=False)`

From 7736619592b227e5fb9132aaa0b907cb558e3b6f Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 5 Apr 2026 09:43:48 -0400
Subject: [PATCH 6/6] Address AI review round 5: OLS vs ASF distinction, DDD
 tree ordering
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- P1: WooldridgeDiD OLS described as direct saturated-regression coefficients,
  ASF only for logit/Poisson — fixed across ROADMAP.md, choosing_estimator.rst,
  llms-full.txt, llms.txt
- P1: Practitioner decision tree now checks DDD before staggered/simple-2x2,
  matching choosing_estimator.rst flowchart
- P2: Fix dead SDDD API link in llms.txt (point to api/index.html)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 ROADMAP.md                  |  2 +-
 docs/choosing_estimator.rst |  2 +-
 docs/llms-full.txt          |  2 +-
 docs/llms-practitioner.txt  | 18 +++++++++---------
 docs/llms.txt               |  6 +++---
 5 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/ROADMAP.md b/ROADMAP.md
index 2ed3385f..495160df 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -16,7 +16,7 @@ diff-diff is a **production-ready** DiD library with feature parity with R's `di
 - **Heterogeneity-robust**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), Borusyak-Jaravel-Spiess Imputation (2024), Two-Stage DiD (Gardner 2022), Stacked DiD (Wing et al. 2024)
 - **Specialized**: Synthetic DiD (Arkhangelsky et al. 2021), Triple Difference, Staggered Triple Difference (Ortiz-Villavicencio & Sant'Anna 2025), Continuous DiD (Callaway, Goodman-Bacon & Sant'Anna 2024), TROP
 - **Efficient**: EfficientDiD (Chen, Sant'Anna & Xie 2025) — semiparametrically efficient with doubly robust covariates
-- **Nonlinear**: WooldridgeDiD / ETWFE (Wooldridge 2023, 2025) — OLS, logit, and Poisson QMLE with ASF-based ATT and delta-method SEs
+- **Nonlinear**: WooldridgeDiD / ETWFE (Wooldridge 2023, 2025) — saturated OLS (direct cohort x time coefficients), logit, and Poisson QMLE (ASF-based ATT with delta-method SEs)
 
 ### Inference & Diagnostics
 
diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst
index e76efa40..77a8fd26 100644
--- a/docs/choosing_estimator.rst
+++ b/docs/choosing_estimator.rst
@@ -115,7 +115,7 @@ Quick Reference
    * - ``WooldridgeDiD``
      - Nonlinear outcomes or saturated OLS
      - Conditional parallel trends
-     - ASF-based ATT (OLS, logit, Poisson)
+     - OLS: direct coefficients; logit/Poisson: ASF-based ATT
    * - ``BaconDecomposition``
      - TWFE diagnostic
      - (diagnostic tool)
diff --git a/docs/llms-full.txt b/docs/llms-full.txt
index ebb4de4a..1191f25b 100644
--- a/docs/llms-full.txt
+++ b/docs/llms-full.txt
@@ -716,7 +716,7 @@ results.print_summary()
 
 ### WooldridgeDiD
 
-Wooldridge (2023, 2025) Extended Two-Way Fixed Effects (ETWFE) estimator. Supports OLS, logit, and Poisson QMLE paths with Average Structural Function (ASF) based ATT and delta-method standard errors.
+Wooldridge (2023, 2025) Extended Two-Way Fixed Effects (ETWFE) estimator. OLS path uses direct saturated-regression coefficients for ATT(g,t). Logit and Poisson QMLE paths use Average Structural Function (ASF) based ATT with delta-method standard errors.
 
 ```python
 WooldridgeDiD(
diff --git a/docs/llms-practitioner.txt b/docs/llms-practitioner.txt
index d92de93e..194b68f7 100644
--- a/docs/llms-practitioner.txt
+++ b/docs/llms-practitioner.txt
@@ -139,6 +139,13 @@ large a violation would need to be to overturn your results.
 Use this decision tree to select the appropriate estimator:
 
 ```
+Is this a triple-difference (DDD) design? (Two criteria: e.g., policy + eligibility)
+|-- YES, simultaneous (2x2x2):   TripleDifference (DDD)
+|-- YES, staggered timing:       StaggeredTripleDifference (SDDD)
+|
+Is treatment continuous (doses/intensities)?
+|-- YES: ContinuousDiD (CDiD)
+|
 Is treatment adoption staggered (multiple cohorts, different timing)?
 |-- YES: Do NOT use plain TWFE. Use one of:
 |   |-- CallawaySantAnna (CS)  -- most general, doubly robust, recommended default
@@ -158,15 +165,8 @@ Is treatment adoption staggered (multiple cohorts, different timing)?
 |-- Suspected factor confounding / interactive fixed effects?
 |   \-- TROP                   -- triply robust with nuclear norm factor adjustment
 |
-|-- Continuous treatment (doses)?
-|   \-- ContinuousDiD (CDiD)
-|
-|-- Nonlinear outcome (binary, count)?
-|   \-- WooldridgeDiD (ETWFE)  -- logit or Poisson QMLE with ASF-based ATT
-|
-\-- Two eligibility criteria (DDD)?
-    |-- Simultaneous treatment (2x2x2):  TripleDifference (DDD)
-    \-- Staggered timing + eligibility:  StaggeredTripleDifference (SDDD)
+\-- Nonlinear outcome (binary, count)?
+    \-- WooldridgeDiD (ETWFE)  -- logit or Poisson QMLE with ASF-based ATT
 ```
 
 Always run BaconDecomposition first if using TWFE, to check for negative
diff --git a/docs/llms.txt b/docs/llms.txt
index e29ffeb9..45cb2a75 100644
--- a/docs/llms.txt
+++ b/docs/llms.txt
@@ -63,8 +63,8 @@ Full practitioner guide: docs/llms-practitioner.txt
 - [StackedDiD](https://diff-diff.readthedocs.io/en/stable/api/stacked_did.html): Wing, Freedman & Hollingsworth (2024) stacked DiD with Q-weights and sub-experiments
 - [EfficientDiD](https://diff-diff.readthedocs.io/en/stable/api/efficient_did.html): Chen, Sant'Anna & Xie (2025) efficient DiD with optimal weighting for tighter SEs
 - [TROP](https://diff-diff.readthedocs.io/en/stable/api/trop.html): Triply Robust Panel estimator (Athey et al. 2025) with nuclear norm factor adjustment
-- [StaggeredTripleDifference](https://diff-diff.readthedocs.io/en/stable/api/staggered_triple_diff.html): Ortiz-Villavicencio & Sant'Anna (2025) staggered DDD with group-time ATT
-- [WooldridgeDiD](https://diff-diff.readthedocs.io/en/stable/api/wooldridge_etwfe.html): Wooldridge (2023, 2025) ETWFE — OLS, logit, Poisson QMLE with ASF-based ATT. Alias: ETWFE
+- [StaggeredTripleDifference](https://diff-diff.readthedocs.io/en/stable/api/index.html): Ortiz-Villavicencio & Sant'Anna (2025) staggered DDD with group-time ATT
+- [WooldridgeDiD](https://diff-diff.readthedocs.io/en/stable/api/wooldridge_etwfe.html): Wooldridge (2023, 2025) ETWFE — saturated OLS, logit/Poisson QMLE (ASF-based ATT). Alias: ETWFE
 - [BaconDecomposition](https://diff-diff.readthedocs.io/en/stable/api/bacon.html): Goodman-Bacon (2021) decomposition for diagnosing TWFE bias in staggered settings
 
 ## Diagnostics and Sensitivity Analysis
@@ -93,7 +93,7 @@ Full practitioner guide: docs/llms-practitioner.txt
 - [14 Continuous DiD](https://diff-diff.readthedocs.io/en/stable/tutorials/14_continuous_did.html): Continuous treatment DiD — dose-response curves, ATT(d), ACRT, B-splines, event study diagnostics
 - [15 Efficient DiD](https://diff-diff.readthedocs.io/en/stable/tutorials/15_efficient_did.html): Chen, Sant'Anna & Xie (2025) efficient DiD — optimal weighting, PT-All vs PT-Post, efficiency gains
 - [16 Survey DiD](https://diff-diff.readthedocs.io/en/stable/tutorials/16_survey_did.html): Survey-weighted DiD — SurveyDesign, strata/PSU/FPC, replicate weights, subpopulation analysis, DEFF diagnostics
-- [16 Wooldridge ETWFE](https://diff-diff.readthedocs.io/en/stable/tutorials/16_wooldridge_etwfe.html): Wooldridge (2023, 2025) ETWFE — OLS/logit/Poisson, ASF-based ATT, aggregation types
+- [16 Wooldridge ETWFE](https://diff-diff.readthedocs.io/en/stable/tutorials/16_wooldridge_etwfe.html): Wooldridge (2023, 2025) ETWFE — saturated OLS, logit/Poisson (ASF-based ATT), aggregation types
 
 ## Survey Support