diff --git a/docs/doc-deps.yaml b/docs/doc-deps.yaml index e38b418e..1528000f 100644 --- a/docs/doc-deps.yaml +++ b/docs/doc-deps.yaml @@ -455,6 +455,9 @@ sources: - path: docs/methodology/REGISTRY.md section: "Survey" type: methodology + - path: docs/methodology/survey-theory.md + type: methodology + note: "Design-based variance theory for modern DiD" - path: docs/survey-roadmap.md type: roadmap - path: docs/tutorials/16_survey_did.ipynb diff --git a/docs/methodology/survey-theory.md b/docs/methodology/survey-theory.md new file mode 100644 index 00000000..52e70972 --- /dev/null +++ b/docs/methodology/survey-theory.md @@ -0,0 +1,733 @@ +# Design-Based Variance Estimation for Modern DiD Estimators + +**Key references:** + +- Binder, D.A. (1983). "On the Variances of Asymptotically Normal Estimators + from Complex Surveys." *International Statistical Review* 51(3), 279--292. +- Lumley, T. (2004). "Analysis of Complex Survey Samples." *Journal of + Statistical Software* 9(8), 1--19. +- Callaway, B. & Sant'Anna, P.H.C. (2021). "Difference-in-Differences with + Multiple Time Periods." *Journal of Econometrics* 225(2), 200--230. + +**Implementation:** `diff_diff/survey.py` + +--- + +## 1. Motivation + +### 1.1. The problem: survey data violates the iid assumption + +Policy evaluations frequently rely on nationally representative surveys: +NHANES (health outcomes), ACS (demographics and housing), BRFSS (behavioral +risk factors), CPS (labor force), and MEPS (medical expenditure). These surveys +employ stratified multi-stage cluster sampling to achieve national coverage at +manageable cost. The resulting data carry two features that invalidate naive +standard errors: (i) observations within the same primary sampling unit (PSU) +are correlated, and (ii) stratification constrains the sampling variability. + +Naive standard errors --- whether heteroskedasticity-robust (HC1) or clustered +at the individual level --- treat the sample as if it were drawn by simple +random sampling. Under complex survey designs this ignores intra-cluster +correlation within PSUs, which typically inflates variance relative to SRS, and +stratification, which typically deflates it. The net effect is design-specific, +but in practice the clustering effect dominates and naive SEs understate true +sampling variance. The ratio of design-based to naive variance is the *design +effect* (DEFF); values of 2--5 are common in health and social surveys. + +This matters especially for difference-in-differences (DiD) estimation because: + +1. Treatment is often assigned at a level that aligns with PSU structure --- + state policies, county programs, school-district mandates --- so the + within-PSU correlation of treatment intensifies the design effect on + treatment-effect estimates. +2. DiD estimands involve contrasts across groups and time periods, amplifying + any distortion in variance estimation. +3. Incorrect SEs can flip significance conclusions for policy-relevant effect + sizes, undermining the credibility of program evaluations. + +### 1.2. The gap: modern DiD theory assumes iid sampling + +The modern DiD literature derives estimators and their asymptotic properties +under sampling assumptions that are incompatible with complex survey designs. +Every foundational paper in this literature either assumes iid sampling +explicitly, or adopts a framework that sidesteps sampling design entirely: + +- **Callaway & Sant'Anna (2021)** state iid as a numbered assumption + (Assumption 2) and derive the multiplier bootstrap under it. The paper + acknowledges design-based inference as an alternative --- citing Athey & + Imbens (2018) --- but does not pursue it. +- **Sant'Anna & Zhao (2020)** assume iid (Assumption 1) and derive the doubly + robust influence function and semiparametric efficiency bounds under it. +- **Borusyak, Jaravel & Spiess (2024)** adopt a conditional/fixed-design + framework that avoids random sampling assumptions altogether, conditioning on + the observation set. Their variance results do not address survey-sampling + uncertainty. +- **Sun & Abraham (2021)** maintain iid as an unstated but operative + assumption in deriving the interaction-weighted estimator. +- **de Chaisemartin & D'Haultfoeuille (2020)** assume group-level + independence (Assumption 3), which does not map to the stratified-cluster + structure of survey data. +- **Gardner (2022)** invokes standard GMM regularity conditions that + implicitly require iid or ergodic stationary data. + +The most comprehensive recent review of the DiD literature --- Roth, Sant'Anna, +Bilinski & Poe (2023), "What's Trending in Difference-in-Differences?" --- +contains no discussion of survey weights, complex survey designs, or +design-based variance estimation. + +### 1.3. The gap in software + +Existing software implementations reflect this theoretical gap. R's `did` +package (Callaway & Sant'Anna) accepts a `weightsname` parameter for point +estimation, but its multiplier bootstrap draws iid unit-level weights without +accounting for strata, PSU, or FPC. Stata's `csdid` (Rios-Avila, Sant'Anna & +Callaway) accepts `pweight` for point estimation but does not support the +`svy:` prefix --- variance estimation ignores the survey design structure. +Neither `did_multiplegt_dyn` (de Chaisemartin & D'Haultfoeuille) nor +`eventstudyinteract` (Sun & Abraham) nor `didimputation` (Borusyak, Jaravel +& Spiess) provide design-based variance. + +In all these implementations, sampling weights enter the point estimate but the +variance estimator treats data as if it were iid (or clustered at the panel +unit, not the survey PSU). + +### 1.4. Adjacent work: survey inference for causal effects + +The survey statistics literature has developed design-based variance theory for +smooth functionals (Binder 1983; Demnati & Rao 2004; Lumley 2004), and recent +work has extended this to causal inference --- but only for cross-sectional +estimators, not panel DiD: + +- **DuGoff, Schuler & Stuart (2014)** provide practical guidance on combining + propensity score methods with complex surveys using Stata's `svy:` framework, + but address cross-sectional treatment effects, not DiD. +- **Zeng, Li & Tong (2025)** derive sandwich variance for survey-weighted + propensity score estimators using influence functions --- the closest work to + the bridge we describe --- but for cross-sectional IPW/augmented weighting, + not staggered DiD. + +No published work formally derives design-based variance for the influence +functions of modern heterogeneity-robust DiD estimators. + +### 1.5. What this document provides + +This document bridges the two literatures. The core argument (Section 4) is +that modern DiD estimators are smooth functionals of the empirical distribution, +and Binder's (1983) theorem therefore guarantees that applying the +stratified-cluster variance formula to their influence function values produces +a design-consistent variance estimator. The argument is a straightforward +application of existing theory, but it has not previously been stated for the +DiD case. + +diff-diff implements this connection: it is the only package --- across R, +Stata, and Python --- that provides design-based variance estimation +(Taylor Series Linearization with strata/PSU/FPC, and replicate weight methods) +for modern heterogeneity-robust DiD estimators. + +For a code walkthrough, see the +[survey tutorial](https://github.com/igerber/diff-diff/blob/main/docs/tutorials/16_survey_did.ipynb). +For the compatibility matrix showing which estimators support which survey +features, see the [Survey Design Support](../choosing_estimator.rst#survey-design-support) +section. + +--- + +## 2. Setup and Notation + +### Finite population and survey design + +Consider a finite population U = {1, ..., N}. The population is partitioned +into H non-overlapping strata. Within stratum h, there are N_h PSUs in the +population, of which n_h are sampled. Within each sampled PSU, observations +are either fully enumerated or sub-sampled. This describes the standard +stratified multi-stage design used by most federal statistical agencies. + +### Sampling weights + +Each sampled observation i carries a sampling weight w_i = 1 / pi_i, where +pi_i is the inclusion probability. Under probability-weight (`pweight`) +semantics, w_i represents how many population units observation i represents. +diff-diff normalizes probability weights to mean 1 (sum = n) to avoid scale +dependence in regression coefficients while preserving the relative +representativeness of each observation. + +### Finite population correction + +The sampling fraction in stratum h is f_h = n_h / N_h. When f_h is close to +1, most of the finite population has been observed and sampling variability is +reduced. The finite population correction factor (1 - f_h) enters the variance +formula to account for this. + +### Notation summary + +| Symbol | Definition | +|--------|-----------| +| U = {1, ..., N} | Finite population | +| H | Number of strata | +| n_h | Number of sampled PSUs in stratum h | +| N_h | Total PSUs in stratum h (for FPC) | +| f_h = n_h / N_h | Sampling fraction in stratum h | +| w_i = 1 / pi_i | Sampling weight for observation i | +| F | Population distribution | +| F_hat_w | Survey-weighted empirical distribution | +| T(F) | Target functional (estimand) | +| theta_hat = T(F_hat_w) | Plug-in estimate | +| psi_i = IF(x_i; T, F) | Influence function value for observation i | + +### Target estimand + +The estimand is theta = T(F), where T is a functional mapping a distribution +to a real number (or vector). For DiD, T extracts treatment effects --- +average treatment effects on the treated (ATTs) --- from the joint distribution +of outcomes, treatment status, and time. The abstraction of theta as a +functional of F is what lets us bridge survey statistics and DiD: both +literatures reason about functionals, just from different perspectives. + +--- + +## 3. Survey-Weighted Estimation + +### Horvitz-Thompson consistency + +Under the survey design, the survey-weighted empirical distribution is: + +``` +F_hat_w = sum_i w_i * delta_{x_i} / sum_i w_i +``` + +where the sum is over sampled observations and delta_{x_i} is the point mass +at x_i. When T is a smooth functional, the plug-in estimator theta_hat = +T(F_hat_w) is design-consistent for theta = T(F): as the sample size grows +within the finite-population asymptotic framework, theta_hat converges in +probability to theta. + +### Regression-based estimators + +For regression-based estimators (DifferenceInDifferences, TwoWayFixedEffects, +MultiPeriodDiD, SunAbraham, StackedDiD, ContinuousDiD), the point estimates +solve weighted estimating equations. The WLS formulation minimizes: + +``` +sum_i w_i * (Y_i - X_i' beta)^2 +``` + +which yields the weighted normal equations: + +``` +sum_i w_i * X_i * (Y_i - X_i' beta) = 0 +``` + +The implementation passes sqrt(w_i)-transformed data to `solve_ols()` in +`diff_diff/linalg.py`. + +### Influence-function-based estimators + +For IF-based estimators (CallawaySantAnna, ImputationDiD, TwoStageDiD, +EfficientDiD, TripleDifference, StaggeredTripleDifference), point estimates +are constructed from survey-weighted sample moments. For example, +CallawaySantAnna with `estimation_method='reg'` computes: + +``` +ATT(g,t) = sum_{i in G_g} w_i * Delta_Y_i / sum_{i in G_g} w_i + - sum_{i in C} w_i * Delta_Y_i / sum_{i in C} w_i +``` + +Every step replaces simple sample averages (1/n) sum_i with weighted averages +(sum_i w_i)^{-1} sum_i w_i. For doubly-robust and IPW variants, the same +principle applies to the propensity score estimation (via survey-weighted +logistic regression) and outcome regression. + +### When is weighting appropriate? + +Solon, Haider & Wooldridge (2015) discuss when weighting is appropriate for +causal inference. Under design-based inference --- the perspective adopted by +diff-diff --- survey weights are needed to ensure that treatment effect +estimates correspond to the finite population, not just the sample. Without +weights, ATT estimates reflect the sample composition, which may +over-represent certain strata due to the sampling design. + +--- + +## 4. Influence Functions and DiD + +This section presents the core argument: why design-based variance estimation +is valid for modern DiD estimators. The argument proceeds in five steps. + +### 4.1. Influence functions are properties of the functional + +The influence function (IF) of a functional T at distribution F is the Gateaux +derivative: + +``` +IF(x; T, F) = lim_{eps -> 0} [T((1-eps)F + eps * delta_x) - T(F)] / eps +``` + +This is a property of the map T and the distribution F. It does not depend on +how the sample was drawn. The same functional T has the same IF regardless of +whether the data come from simple random sampling, stratified sampling, or +cluster sampling. The IF characterizes each observation's first-order +contribution to the estimator. + +### 4.2. Modern DiD estimators are smooth functionals + +Each modern DiD estimator can be written as theta = T(F) for a smooth +functional T that admits an influence function representation. The key +estimators and their smoothness arguments: + +- **CallawaySantAnna (reg):** T(F) involves population means of outcomes + within group-time cells. Sample means are smooth functionals of F. + +- **CallawaySantAnna (dr/ipw):** T(F) additionally involves a propensity + score model (smooth in population moments) and outcome regression (smooth in + population moments). Sant'Anna & Zhao (2020) derive the full IF, including + nuisance-function corrections. + +- **SunAbraham:** T(F) is a linear functional of interaction-weighted + regression coefficients, which are themselves smooth functionals of F via the + implicit function theorem applied to the normal equations. + +- **ImputationDiD:** T(F) involves OLS on untreated observations (smooth), + counterfactual imputation (linear in coefficients), and averaging treatment- + minus-imputed residuals (smooth). The IF follows from Theorem 3 of Borusyak, + Jaravel & Spiess (2024). + +- **EfficientDiD:** T(F) involves population means and covariances within + cohort-time cells. The efficient influence function (EIF) is derived in the + original paper. + +- **ContinuousDiD:** T(F) involves B-spline regression coefficients, smooth + functionals of F (Callaway, Goodman-Bacon & Sant'Anna 2024). + +- **TripleDifference:** T(F) extends the two-group DiD sandwich to a triple + contrast. The IF follows by the same arguments as DifferenceInDifferences. + +- **StaggeredTripleDifference:** Staggered DDD with IF-based aggregation + across group-time-subgroup cells. Smooth by the same logic as + CallawaySantAnna. + +- **TwoStageDiD:** Gardner (2022) two-stage imputation. The IF captures + uncertainty from both the first-stage regression and the second-stage + contrast. + +- **WooldridgeDiD:** Poisson or OLS regression with saturated interaction + terms. Smooth via the estimating-equation representation. *Note:* survey + design support is not yet implemented for this estimator. + +- **StackedDiD:** Q-weighted regression on stacked sub-experiments. Smooth in + the population moments of each sub-experiment. + +The common thread: all these estimators reduce to combinations of weighted +means, regression coefficients, and smooth transformations thereof. Each admits +an IF representation. + +### 4.2a. Where the IF chain does not apply + +Two estimators in diff-diff --- **SyntheticDiD** and **TROP** --- involve +non-smooth optimization steps (synthetic control weight selection, optimal +transport maps) that do not fit cleanly into the smooth-functional framework. +Their survey support is limited to bootstrap-only variance estimation: the +bootstrap resamples PSUs within strata (Rao-Wu rescaled), bypassing the need +for an IF. For SyntheticDiD, each draw re-runs the full estimator on resampled +data. For TROP, per-observation treatment effects (tau_it) are deterministic +given the data and do not depend on survey weights, so the Rao-Wu path +precomputes tau values once and only varies the ATT aggregation weights across +draws (see REGISTRY.md for the documented optimization). The TSL/IF-based +argument in this document does not extend to these estimators. + +### 4.3. Under survey weighting, the same IF form applies + +Under survey weighting, we replace F with F_hat_w (the survey-weighted +empirical distribution). The estimator becomes theta_hat = T(F_hat_w). Because +the IF is a property of T, not the sampling design, the first-order von Mises +expansion is: + +``` +T(F_hat_w) - T(F) = sum_i d_i * psi_i + o_p(n^{-1/2}) +``` + +where d_i = 1 if unit i is sampled (0 otherwise), and psi_i = w_i * IF(x_i; +T, F) / N is the scaled influence function value. The key observation: this +linearized form is a weighted sum over the sampled observations, and its +variance is determined by the sampling design --- not by T. The IF transforms +the problem of estimating Var(theta_hat) into the simpler problem of estimating +the variance of a weighted total. + +### 4.4. Binder's (1983) result + +Binder (1983) formalized this insight. The key result: for any smooth +functional T, the design-based variance of theta_hat = T(F_hat_w) can be +consistently estimated by applying the standard stratified-cluster variance +formula to the per-unit IF values psi_i. Specifically: + +``` +V_hat(theta_hat) = sum_h (1 - f_h) * (n_h / (n_h - 1)) + * sum_{j=1}^{n_h} (psi_hj - psi_h_bar)^2 +``` + +where psi_hj = sum_{i in PSU j, stratum h} psi_i is the PSU-level total of IF +values, and psi_h_bar is the within-stratum mean of PSU totals. + +This works because theta_hat is asymptotically equivalent to a linear function +of survey-weighted totals. Once linearized via the IF, the variance of +theta_hat inherits the same structure as the variance of a Horvitz-Thompson +total, which the survey statistics literature has established formulas for. + +### 4.5. Combining the pieces + +The chain of reasoning: + +1. Modern DiD estimators are smooth functionals of F (Section 4.2). +2. Their IFs are well-defined and do not depend on the sampling design + (Section 4.1). +3. Under survey weighting, the estimator theta_hat = T(F_hat_w) has a + first-order expansion in terms of the same IF values (Section 4.3). +4. Binder (1983) shows that applying the stratified-cluster variance formula + to these IF values gives a consistent variance estimator (Section 4.4). + +Therefore: plugging the IF values from any modern DiD estimator into the +stratified-cluster variance formula produces a design-consistent variance +estimator. This is exactly what diff-diff implements. + +The argument requires that each DiD estimator satisfies the regularity +conditions for Binder's theorem (existence of a continuous IF, remainder +term of order o_p(n^{-1/2})). For regression-based estimators, this follows +from the implicit function theorem applied to the estimating equations. For +doubly-robust estimators, this follows from the semiparametric theory of +Sant'Anna & Zhao (2020). For imputation estimators, the IF from Theorem 3 of +Borusyak et al. (2024) satisfies these conditions. + +--- + +## 5. Taylor Series Linearization (TSL) Variance + +### Regression-based TSL sandwich + +For regression-based estimators, the TSL variance-covariance matrix is the +stratified cluster sandwich (Binder 1983): + +``` +V_TSL = (X'WX)^{-1} [sum_h V_h] (X'WX)^{-1} +``` + +This is the standard sandwich estimator with the "meat" computed at the PSU +level within strata. The implementation is `compute_survey_vcov()` in +`diff_diff/survey.py`. + +### Stratum-level meat + +The variance contribution from stratum h is: + +``` +V_h = (1 - f_h) * (n_h / (n_h - 1)) * sum_{j=1}^{n_h} (T_hj - T_h_bar)(T_hj - T_h_bar)' +``` + +where: +- T_hj = sum_{i in PSU j, stratum h} w_i * X_i * u_i is the PSU-level score + total (with u_i = Y_i - X_i' beta the residual), +- T_h_bar = (1/n_h) sum_j T_hj is the within-stratum mean of PSU-level scores, +- (1 - f_h) is the finite population correction, +- n_h / (n_h - 1) is the small-sample degrees-of-freedom adjustment. + +The total meat is sum_h V_h, computed by `_compute_stratified_psu_meat()` in +`diff_diff/survey.py`. + +### IF-based TSL variance + +For scalar IF-based estimators (CallawaySantAnna, ImputationDiD, TwoStageDiD, +TripleDifference, StaggeredTripleDifference, EfficientDiD), the variance is +computed directly from per-unit influence function values without the bread +matrix: + +``` +V_design = sum_h (1 - f_h) * (n_h / (n_h - 1)) * sum_{j=1}^{n_h} (psi_hj - psi_h_bar)^2 +``` + +where psi_hj = sum_{i in PSU j, stratum h} psi_i is the PSU-level total of +IF values. This is the same formula as the meat in the regression sandwich, but +applied directly to the scalar IF values rather than to score vectors. The +implementation is `compute_survey_if_variance()` in `diff_diff/survey.py`. + +**Residual-scale vs. score-scale.** These two functions accept inputs at +different scales. `compute_survey_vcov()` takes residuals on the original +scale (u_i = Y_i - X_i' beta) and multiplies by w_i internally to form +scores. `compute_survey_if_variance()` takes score-scale psi_i values +directly --- weights are already baked in. To see the connection: when +`compute_survey_vcov()` is called with X = [1]' and residuals = eif (raw +efficient influence function values), it internally forms scores = w_i * eif_i +and produces sandwich = (sum w)^{-2} * meat(w * eif). The scalar IF function +`compute_survey_if_variance(psi)` produces meat(psi) directly. These are +equivalent when psi_i = w_i * eif_i / sum(w) --- i.e., when the IF values +are on score-scale. EfficientDiD exploits this: the TSL path passes raw EIF +values to `compute_survey_vcov()` (which handles scaling), while the replicate +path explicitly converts to score-scale via psi = w * eif / sum(w) before +calling `compute_replicate_if_variance()`. + +### Degrees of freedom + +Inference uses the t-distribution with survey degrees of freedom: + +| Design | df | +|--------|-----| +| Explicit PSU + strata | n_PSU - n_strata | +| Explicit PSU, no strata | n_PSU - 1 | +| Replicate weights | rank(W_rep) - 1 | +| No survey structure | n - 1 | + +For replicate weights, the degrees of freedom are computed via the QR +rank of the analysis-weight matrix, matching R's `survey::degf()`. + +### Singleton stratum handling + +When a stratum contains only one sampled PSU (n_h = 1), the within-stratum +variance is undefined (division by n_h - 1 = 0). diff-diff provides three +strategies via the `lonely_psu` parameter: + +- **"remove"**: Skip singleton strata and emit a warning. The variance estimate + excludes these strata entirely. +- **"certainty"**: Treat singleton PSUs as sampled with certainty (f_h = 1), + contributing zero to the variance. +- **"adjust"**: Center the singleton stratum's PSU total at the grand mean of + all PSU totals instead of the (undefined) within-stratum mean. + +--- + +## 6. Replicate Weight Variance + +### Motivation + +Replicate weights provide an alternative to TSL. Instead of linearizing the +estimator, they perturb the weights and observe the resulting variation in +estimates. This approach is useful when: + +1. The survey agency provides pre-computed replicate weights with the + public-use file (common for ACS, CPS, NHANES). +2. The estimator is too complex for easy linearization (though the IF-based + approach in diff-diff largely eliminates this concern for smooth + functionals). + +Replicate weights are mutually exclusive with strata/PSU/FPC at the +`SurveyDesign` level: the design information is already embedded in the +replicate weight construction. + +### Supported methods + +diff-diff supports five replicate-weight methods. The general variance formula +is: + +``` +V_rep = c * sum_r s_r * (theta_r - theta_center)^2 +``` + +where theta_r is the estimate from replicate r, theta_center is either the +full-sample estimate (mse=True) or the mean of replicate estimates (mse=False), +and c and s_r are method-specific factors. + +The method-specific formulas (matching `_replicate_variance_factor()` in +`diff_diff/survey.py`): + +``` +BRR: V = (1/R) * sum_r (theta_r - theta)^2 +Fay: V = 1/[R * (1-rho)^2] * sum_r (theta_r - theta)^2 +JK1: V = (R-1)/R * sum_r (theta_r - theta)^2 +SDR: V = (4/R) * sum_r (theta_r - theta)^2 +JKn: V = sum_h [(n_h-1)/n_h] * sum_{r in h} (theta_r - theta)^2 +``` + +where R is the number of replicate columns, rho is the Fay perturbation +factor, and n_h is the number of replicates in stratum h (for JKn). + +### Replicate variance for IF-based estimators + +For regression-based estimators, `compute_replicate_vcov()` re-runs WLS for +each replicate weight column to obtain theta_r. For IF-based estimators, this +would require R complete re-fits of the estimator, which is computationally +expensive. + +diff-diff avoids this for most IF-based estimators (CallawaySantAnna, +EfficientDiD, ContinuousDiD, TripleDifference, StaggeredTripleDifference) +using weight-ratio rescaling: the replicate estimate is computed by +reweighting the per-unit IF values rather than re-running the estimator. +The `SurveyDesign` parameter `combined_weights` controls the interpretation: + +``` +combined_weights=True: theta_r = sum_i (w_{r,i} / w_i) * psi_i +combined_weights=False: theta_r = sum_i w_{r,i} * psi_i +``` + +When `combined_weights=True`, the replicate columns w_{r,i} already +incorporate the full-sample weight, so the ratio w_{r,i} / w_i extracts the +perturbation factor. When `combined_weights=False`, the replicate columns are +the perturbation factors directly. This rescaling is numerically exact for +smooth functionals (to first order) and avoids the cost of R re-fits. The +implementation is in `compute_replicate_if_variance()` in `diff_diff/survey.py`. + +**Exception: refit-based replicate variance.** For ImputationDiD and +TwoStageDiD, the first-stage regression (on untreated observations) must be +re-estimated with each replicate's weights to properly capture its +contribution to variance. These estimators use +`compute_replicate_refit_variance()`, which re-runs the full estimator for +each replicate column. + +--- + +## 7. What Survey Weighting Does NOT Fix + +Survey weighting addresses the sampling design. It does not resolve every +threat to valid causal inference with DiD. Practitioners should be aware of +the following limitations. + +**Parallel trends.** Survey weighting ensures that treatment-effect estimates +target the correct population. It does not validate the parallel trends +assumption. Under the superpopulation model, parallel trends must hold for the +population, not just the sample. If parallel trends fail in the population, +survey-weighted estimates remain biased --- with correctly estimated standard +errors around the wrong estimand. Use `HonestDiD` for sensitivity analysis. + +**Small-cluster asymptotics.** TSL variance requires at least 2 PSUs per +stratum (n_h >= 2). With few PSUs per stratum --- common in some state-based +surveys --- the t-distribution approximation with df = n_PSU - n_strata may +be anti-conservative. diff-diff reports the survey degrees of freedom so users +can assess this directly. + +**Informative sampling.** Binder's theorem assumes non-informative sampling: +selection into the sample depends only on design variables (strata, PSU), not +on potential outcomes conditional on those variables. If treatment effects vary +with selection probability in ways not captured by the stratification, IF +values may be biased even after weighting. + +**SUTVA.** Survey weighting does not address interference between units. If +treatment of one unit affects outcomes of another (spillovers), the ATT +estimand is not well-defined regardless of the variance estimator. + +**Weight variability.** Highly variable weights reduce effective sample size. +The design effect DEFF = n * sum(w_i^2) / (sum(w_i))^2 measures this: when +DEFF >> 1, estimates are less precise than the nominal sample size suggests. +diff-diff reports DEFF in `SurveyMetadata` to help users assess this. + +**Model misspecification.** For doubly-robust and IPW estimators +(CallawaySantAnna with `estimation_method='dr'` or `'ipw'`), the IF +corrections for propensity score and outcome regression uncertainty assume +correct specification of at least one nuisance model. Survey weighting does not +rescue a badly specified propensity score or outcome model. + +--- + +## 8. Implementation in diff-diff + +### Two variance paths + +diff-diff provides two variance estimation strategies for survey data: + +1. **Taylor Series Linearization (TSL):** Uses strata, PSU, and FPC to compute + the stratified-cluster sandwich. Available for all estimators with + analytical (non-bootstrap) survey variance. +2. **Replicate weights:** Uses pre-computed replicate weight columns (BRR, Fay, + JK1, JKn, SDR). Available where indicated in the compatibility matrix. + +These are mutually exclusive at the `SurveyDesign` level. + +### Estimator survey variance dispatch + +Each estimator uses one of three variance strategies under survey designs: + +| Estimator | Variance path | Notes | +|-----------|--------------|-------| +| DifferenceInDifferences | TSL sandwich | OLS-based, all weight types | +| TwoWayFixedEffects | TSL sandwich | OLS-based, all weight types | +| MultiPeriodDiD | TSL sandwich | OLS-based, all weight types | +| CallawaySantAnna | TSL on IFs | pweight only | +| SunAbraham | TSL sandwich | OLS-based, all weight types | +| TripleDifference | TSL on IFs | pweight only | +| StaggeredTripleDifference | TSL on IFs | pweight only | +| ImputationDiD | TSL on IFs | pweight only | +| TwoStageDiD | TSL on IFs | pweight only | +| EfficientDiD | TSL on EIFs | all weight types | +| ContinuousDiD | TSL sandwich | all weight types | +| StackedDiD | TSL sandwich | pweight only | +| SyntheticDiD | Bootstrap only | Not IF-amenable (Section 4.2a) | +| TROP | Bootstrap only | Not IF-amenable (Section 4.2a) | +| BaconDecomposition | Diagnostic only | Weighted descriptives, no inference | + +For the definitive compatibility matrix including replicate weight and survey +bootstrap support, see the +[Survey Design Support](../choosing_estimator.rst#survey-design-support) section. + +### IF-based variance path in detail + +For IF-based estimators, the variance computation proceeds as: + +1. The estimator computes per-unit influence function values psi_i for each + group-time cell (g, t). +2. These are aggregated across cells with weight-influence-function (WIF) + adjustment to produce a single per-unit IF vector for the overall ATT. +3. The aggregated IF vector is passed to `compute_survey_if_variance()`, which + computes the design-based variance using `_compute_stratified_psu_meat()`. +4. For replicate weights, most IF-based estimators use + `compute_replicate_if_variance()`, which reweights the IF vector via + weight-ratio rescaling. ImputationDiD and TwoStageDiD instead use + `compute_replicate_refit_variance()`, which re-runs the full estimator + for each replicate column (see Section 6). + +### Bootstrap and survey interaction + +Two bootstrap strategies interact with survey designs: + +- **Multiplier bootstrap at PSU level** (CallawaySantAnna, ImputationDiD, + TwoStageDiD, ContinuousDiD, EfficientDiD, StaggeredTripleDifference): + Generates multiplier weights at the PSU level within strata, with FPC + scaling. Each bootstrap draw reweights the IF values. + +- **Rao-Wu rescaled bootstrap** (SunAbraham, SyntheticDiD, TROP): Draws PSUs + with replacement within strata and rescales observation weights. Each draw + re-runs the full estimator on the resampled data. + +--- + +## References + +### Survey statistics + +- Binder, D.A. (1983). "On the Variances of Asymptotically Normal Estimators + from Complex Surveys." *International Statistical Review* 51(3), 279--292. +- Demnati, A. & Rao, J.N.K. (2004). "Linearization Variance Estimators for + Survey Data." *Survey Methodology* 30(1), 17--26. +- Lumley, T. (2004). "Analysis of Complex Survey Samples." *Journal of + Statistical Software* 9(8), 1--19. +- Rao, J.N.K. & Wu, C.F.J. (1988). "Resampling Inference with Complex Survey + Data." *Journal of the American Statistical Association* 83(401), 231--241. +- Shao, J. (1996). "Resampling Methods in Sample Surveys." *Statistics* + 27(3--4), 203--237. + +### Modern DiD + +- Borusyak, K., Jaravel, X. & Spiess, J. (2024). "Revisiting Event-Study + Designs: Robust and Efficient Estimation." *Review of Economic Studies* + 91(6), 3253--3285. +- Callaway, B. & Sant'Anna, P.H.C. (2021). "Difference-in-Differences with + Multiple Time Periods." *Journal of Econometrics* 225(2), 200--230. +- Callaway, B., Goodman-Bacon, A. & Sant'Anna, P.H.C. (2024). + "Difference-in-Differences with a Continuous Treatment." NBER Working Paper + 32117. +- de Chaisemartin, C. & D'Haultfoeuille, X. (2020). "Two-Way Fixed Effects + Estimators with Heterogeneous Treatment Effects." *American Economic Review* + 110(9), 2964--2996. +- Gardner, J. (2022). "Two-Stage Differences in Differences." Working Paper. +- Roth, J., Sant'Anna, P.H.C., Bilinski, A. & Poe, J. (2023). "What's + Trending in Difference-in-Differences? A Synthesis of the Recent + Econometrics Literature." *Journal of Econometrics* 235(2), 2218--2244. +- Sant'Anna, P.H.C. & Zhao, J. (2020). "Doubly Robust Difference-in- + Differences Estimators." *Journal of Econometrics* 219(1), 101--122. +- Sun, L. & Abraham, S. (2021). "Estimating Dynamic Treatment Effects in + Event Studies with Heterogeneous Treatment Effects." *Journal of Econometrics* + 225(2), 175--199. + +### Survey-weighted causal inference (cross-sectional) + +- DuGoff, E.H., Schuler, M. & Stuart, E.A. (2014). "Generalizing + Observational Study Results: Applying Propensity Score Methods to Complex + Surveys." *Health Services Research* 49(1), 284--303. +- Solon, G., Haider, S.J. & Wooldridge, J.M. (2015). "What Are We Weighting + For?" *Journal of Human Resources* 50(2), 301--316. +- Zeng, S., Li, F. & Tong, X. (2025). "Moving toward Best Practice when + Using Propensity Score Weighting in Survey Observational Studies." + arXiv:2501.16156. diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md index 6e56db61..bae00ce6 100644 --- a/docs/survey-roadmap.md +++ b/docs/survey-roadmap.md @@ -112,10 +112,10 @@ Before broadly announcing survey capability, these items establish the theoretical and empirical foundation needed for credibility with practitioners and methodologists. -### 10a. Theory Document (HIGH priority) +### 10a. Theory Document (HIGH priority) ✅ -Write `docs/methodology/survey-theory.md` laying out the formal argument -for design-based variance estimation with modern DiD influence functions: +`docs/methodology/survey-theory.md` lays out the formal argument for +design-based variance estimation with modern DiD influence functions: 1. Modern heterogeneity-robust DiD estimators (CS, SA, BJS) are smooth functionals of the weighted empirical distribution