Add survey theory document (Phase 10a) by igerber · Pull Request #273 · igerber/diff-diff

igerber · 2026-04-05T14:05:18Z

Summary

Create docs/methodology/survey-theory.md — formal justification for design-based variance estimation (TSL, replicate weights) with modern heterogeneity-robust DiD influence functions
Update docs/doc-deps.yaml to track the new methodology file under diff_diff/survey.py
Mark Phase 10a complete in docs/survey-roadmap.md

Methodology references (required if estimator / math changes)

Method name(s): Taylor Series Linearization (Binder 1983), replicate weight variance (Rao & Wu 1988, Shao 1996), DiD influence functions (Callaway & Sant'Anna 2021, Sant'Anna & Zhao 2020, Borusyak et al. 2024)
Paper / source link(s): All cited in the document's References section with full bibliographic entries
Any intentional deviations from the source (and why): None — this is a documentation-only change that describes existing implementation

Validation

Tests added/updated: No test changes (documentation only)
Backtest / simulation / notebook evidence (if applicable): N/A

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Create docs/methodology/survey-theory.md — a formal justification for why design-based variance estimation (TSL, replicate weights) is valid when combined with modern heterogeneity-robust DiD influence functions. The document bridges two literatures that have developed in parallel: survey statistics (Binder 1983, Rao & Wu 1988) and modern DiD (Callaway & Sant'Anna 2021, Sant'Anna & Zhao 2020). The core argument is that modern DiD estimators are smooth functionals whose IFs are properties of the functional, not the sampling design, so Binder's theorem applies. Includes a lit review showing that all foundational DiD papers assume iid sampling, existing software (R did, Stata csdid) ignores survey design for variance, and no prior work formally derives this combination. Also updates doc-deps.yaml and marks Phase 10a complete in survey-roadmap. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-05T14:12:13Z

Overall Assessment
⚠️ Needs changes

This is a docs-only PR, but the new theory document is itself the deliverable here. I found two P1 methodology inaccuracies in the variance discussion, plus one P2 support-matrix mismatch.

Executive Summary

The new survey theory doc states an incorrect equivalence between compute_survey_vcov() and compute_survey_if_variance() for scalar IFs; the code weights residuals internally, so the residual-scale object and the score-scale IF are being conflated.
The replicate-weight section conflates generic replicate-weight variance with Rao-Wu resampling, omits the documented combined_weights=False branch, and overstates that IF-based replicate support always uses compute_replicate_if_variance().
The bootstrap dispatch list incorrectly includes TripleDifference under PSU-level multiplier bootstrap, while the published support matrix says survey bootstrap is not supported there.
I did not find code-quality, performance, or security issues beyond these documentation-accuracy problems.

Methodology
Affected methods: survey TSL/IF variance generally, plus replicate-weight variance for CallawaySantAnna, ContinuousDiD, EfficientDiD, TripleDifference, ImputationDiD, and TwoStageDiD, and survey bootstrap dispatch for TripleDifference and TROP.

Severity P1: The scalar TSL equivalence in the new theory doc is mathematically wrong. docs/methodology/survey-theory.md:L447-L450 says compute_survey_vcov() with X = 1 and residuals = psi reduces to the scalar IF formula, but compute_survey_vcov() multiplies residuals by w_i internally before forming the meat. The actual equivalence is between compute_survey_vcov(X=1, residuals=eif) and compute_survey_if_variance(psi_scaled) with score-scale psi_scaled = w * eif / sum(w), which is exactly how EfficientDiD handles the replicate path. Impact: the theory document now gives the wrong variance identity and the wrong scaling rationale for future implementations/refactors. Concrete fix: rewrite that paragraph to distinguish residual-scale{EIF} values from score-scale psi. Refs: diff_diff/survey.py:L1428-L1451, diff_diff/survey.py:L1463-L1507, diff_diff/efficient_did.py:L1129-L1141.
Severity P1: The replicate-weight sections describe the wrong methodology surface. docs/methodology/survey-theory.md:L533-L544 presents theta_r = sum((w_r / w_i) * psi_i) as the general IF replicate formula and labels it “Rao-Wu reweighting,” but the registry and implementation support both combined_weights=True and combined_weights=False. The dispatch text in docs/methodology/survey-theory.md:L632-L641 is also too broad: ImputationDiD and TwoStageDiD currently use compute_replicate_refit_variance(), not IF reweighting. Impact: the document now misstates both the supported replicate-weight math and which estimators actually use the IF shortcut. Concrete fix: rename this as generic replicate-weight rescaling, document both combined_weights cases, and explicitly carve out the refit-based replicate paths for ImputationDiD and TwoStageDiD. Refs: docs/methodology/REGISTRY.md:L2376-L2385, diff_diff/survey.py:L1707-L1742, diff_diff/imputation.py:L588-L674, diff_diff/two_stage.py:L572-L688.
Severity P2: The bootstrap section lists TripleDifference as supporting PSU-level multiplier bootstrap under survey designs, but the library’s support matrix says survey bootstrap is unavailable for that estimator. Impact: readers will infer an unsupported inference path exists. Concrete fix: remove TripleDifference from the multiplier-bootstrap list in the new doc, or add/ship that support and update the central support matrix consistently. Refs: docs/methodology/survey-theory.md:L647-L650, docs/choosing_estimator.rst:L636-L645, diff_diff/triple_diff.py:L352-L405.
Severity P3: The TROP paragraph says Rao-Wu survey bootstrap “re-runs the full estimator for each replicate,” but the registry already documents the implemented deviation: the code reuses precomputed tau_it and only reweights ATT aggregation, which is mathematically equivalent. Because that deviation is documented in REGISTRY.md, this is informational rather than blocking. Impact: minor doc drift against the registry. Concrete fix: mirror the registry note or link to it. Refs: docs/methodology/survey-theory.md:L325-L331, docs/methodology/REGISTRY.md:L2355-L2359.

Code Quality

No findings in scope; this PR does not modify runtime code.

Performance

No findings in scope.

Maintainability

Severity P3: The new doc hard-codes internal survey.py line numbers, which will drift after routine edits. Impact: the document will go stale for non-methodological reasons. Concrete fix: reference function names or docs anchors instead of raw source line numbers. Refs: docs/methodology/survey-theory.md:L409-L445, docs/methodology/survey-theory.md:L512-L544.

Tech Debt

Severity P3: docs/survey-roadmap.md marks Phase 10a complete before the theory doc is fully aligned with the current implementation. Impact: the roadmap overstates completion. Concrete fix: keep the checkmark only if the documentation fixes above land in this PR; otherwise revert it for now. Refs: docs/survey-roadmap.md:L115-L123.

Security

No findings in scope; docs-only change and no secrets/PII are visible in the diff.

Documentation/Tests

No additional findings beyond the methodology documentation issues above. No code tests were required for a docs-only PR.

Path to Approval

Correct the scalar TSL/IF equivalence paragraph so it uses residual-scale{EIF} values in compute_survey_vcov() and score-scale psi = w{EIF}/sum(w) in compute_survey_if_variance().
Rewrite the replicate-weight sections to distinguish generic replicate-weight variance from Rao-Wu bootstrap, document both combined_weights modes, and note that ImputationDiD and TwoStageDiD currently use compute_replicate_refit_variance().
Remove TripleDifference from the PSU multiplier-bootstrap list in survey-theory.md or add that support and align docs/choosing_estimator.rst and the codebase consistently.

- P1: Correct scalar TSL/IF equivalence paragraph to distinguish residual-scale EIF (compute_survey_vcov input) from score-scale psi (compute_survey_if_variance input), matching EfficientDiD's actual usage - P1: Rewrite replicate-weight section to document both combined_weights modes, carve out ImputationDiD/TwoStageDiD refit path, and drop incorrect "Rao-Wu reweighting" label - P2: Remove TripleDifference from PSU multiplier bootstrap list to match choosing_estimator.rst compatibility matrix - P3: Fix TROP description to note precomputed tau optimization per REGISTRY.md - P3: Replace all hard-coded survey.py line numbers with function names Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-05T14:26:25Z

/ai-review

github-actions · 2026-04-05T14:30:45Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 5c4f428fdce2a9842ffda7faa33aced9b868aea9

Overall Assessment

✅ Looks good

Executive Summary

This re-review resolves the prior P1/P2 documentation mismatches in the new survey theory doc.
The scalar TSL/IF discussion now correctly distinguishes residual-scale inputs to compute_survey_vcov() from score-scale psi inputs to compute_survey_if_variance(), matching docs/methodology/survey-theory.md:L450, diff_diff/survey.py:L1403, and diff_diff/efficient_did.py:L1119.
The replicate-weight section now matches the Methodology Registry and implementation on combined_weights semantics and the refit-based exceptions for ImputationDiD and TwoStageDiD, in docs/methodology/survey-theory.md:L539, docs/methodology/REGISTRY.md:L2376, diff_diff/survey.py:L1707, diff_diff/imputation.py:L587, and diff_diff/two_stage.py:L569.
The survey-bootstrap support text no longer overstates TripleDifference support and now aligns with the central support matrix in docs/methodology/survey-theory.md:L673 and docs/choosing_estimator.rst:L636.
The supporting doc updates in docs/doc-deps.yaml:L452 and docs/survey-roadmap.md:L115 are consistent with a completed Phase 10a documentation deliverable.

Methodology

No findings. Affected methods reviewed: survey TSL variance, replicate-weight variance, and survey-bootstrap dispatch across CallawaySantAnna, ContinuousDiD, EfficientDiD, ImputationDiD, TwoStageDiD, TripleDifference, StaggeredTripleDifference, SunAbraham, SyntheticDiD, and TROP. The changed text now matches the implementation and registry, including the documented TROP deviation at docs/methodology/REGISTRY.md:L2355.

Code Quality

No findings. No runtime code changed.

Performance

No findings. Docs-only PR.

Maintainability

No findings. Adding the theory doc to docs/doc-deps.yaml:L452 improves drift tracking for diff_diff/survey.py.

Tech Debt

No findings. No new deferred-work item is needed, and marking Phase 10a complete in docs/survey-roadmap.md:L115 is reasonable.

Security

No findings. Docs-only change and no secrets or PII are visible in scope.

Documentation/Tests

No findings. No code-path changes were introduced, so the absence of new tests is acceptable here. Residual risk is limited to future documentation drift, partially mitigated by docs/doc-deps.yaml:L452.

igerber added the ready-for-ci Triggers CI test workflows label Apr 5, 2026

igerber merged commit 7777644 into main Apr 5, 2026

igerber deleted the survey-theory-doc branch April 5, 2026 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add survey theory document (Phase 10a)#273

Add survey theory document (Phase 10a)#273
igerber merged 2 commits intomainfrom
survey-theory-doc

igerber commented Apr 5, 2026

Uh oh!

github-actions bot commented Apr 5, 2026

Uh oh!

igerber commented Apr 5, 2026

Uh oh!

github-actions bot commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Apr 5, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions bot commented Apr 5, 2026

Uh oh!

igerber commented Apr 5, 2026

Uh oh!

github-actions bot commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant