Skip to content

Add survey theory document (Phase 10a)#273

Merged
igerber merged 2 commits intomainfrom
survey-theory-doc
Apr 5, 2026
Merged

Add survey theory document (Phase 10a)#273
igerber merged 2 commits intomainfrom
survey-theory-doc

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 5, 2026

Summary

  • Create docs/methodology/survey-theory.md — formal justification for design-based variance estimation (TSL, replicate weights) with modern heterogeneity-robust DiD influence functions
  • Update docs/doc-deps.yaml to track the new methodology file under diff_diff/survey.py
  • Mark Phase 10a complete in docs/survey-roadmap.md

Methodology references (required if estimator / math changes)

  • Method name(s): Taylor Series Linearization (Binder 1983), replicate weight variance (Rao & Wu 1988, Shao 1996), DiD influence functions (Callaway & Sant'Anna 2021, Sant'Anna & Zhao 2020, Borusyak et al. 2024)
  • Paper / source link(s): All cited in the document's References section with full bibliographic entries
  • Any intentional deviations from the source (and why): None — this is a documentation-only change that describes existing implementation

Validation

  • Tests added/updated: No test changes (documentation only)
  • Backtest / simulation / notebook evidence (if applicable): N/A

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Create docs/methodology/survey-theory.md — a formal justification for
why design-based variance estimation (TSL, replicate weights) is valid
when combined with modern heterogeneity-robust DiD influence functions.

The document bridges two literatures that have developed in parallel:
survey statistics (Binder 1983, Rao & Wu 1988) and modern DiD (Callaway
& Sant'Anna 2021, Sant'Anna & Zhao 2020). The core argument is that
modern DiD estimators are smooth functionals whose IFs are properties of
the functional, not the sampling design, so Binder's theorem applies.

Includes a lit review showing that all foundational DiD papers assume iid
sampling, existing software (R did, Stata csdid) ignores survey design
for variance, and no prior work formally derives this combination.

Also updates doc-deps.yaml and marks Phase 10a complete in survey-roadmap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

Overall Assessment
⚠️ Needs changes

This is a docs-only PR, but the new theory document is itself the deliverable here. I found two P1 methodology inaccuracies in the variance discussion, plus one P2 support-matrix mismatch.

Executive Summary

  • The new survey theory doc states an incorrect equivalence between compute_survey_vcov() and compute_survey_if_variance() for scalar IFs; the code weights residuals internally, so the residual-scale object and the score-scale IF are being conflated.
  • The replicate-weight section conflates generic replicate-weight variance with Rao-Wu resampling, omits the documented combined_weights=False branch, and overstates that IF-based replicate support always uses compute_replicate_if_variance().
  • The bootstrap dispatch list incorrectly includes TripleDifference under PSU-level multiplier bootstrap, while the published support matrix says survey bootstrap is not supported there.
  • I did not find code-quality, performance, or security issues beyond these documentation-accuracy problems.

Methodology
Affected methods: survey TSL/IF variance generally, plus replicate-weight variance for CallawaySantAnna, ContinuousDiD, EfficientDiD, TripleDifference, ImputationDiD, and TwoStageDiD, and survey bootstrap dispatch for TripleDifference and TROP.

  • Severity P1: The scalar TSL equivalence in the new theory doc is mathematically wrong. docs/methodology/survey-theory.md:L447-L450 says compute_survey_vcov() with X = 1 and residuals = psi reduces to the scalar IF formula, but compute_survey_vcov() multiplies residuals by w_i internally before forming the meat. The actual equivalence is between compute_survey_vcov(X=1, residuals=eif) and compute_survey_if_variance(psi_scaled) with score-scale psi_scaled = w * eif / sum(w), which is exactly how EfficientDiD handles the replicate path. Impact: the theory document now gives the wrong variance identity and the wrong scaling rationale for future implementations/refactors. Concrete fix: rewrite that paragraph to distinguish residual-scale{EIF} values from score-scale psi. Refs: diff_diff/survey.py:L1428-L1451, diff_diff/survey.py:L1463-L1507, diff_diff/efficient_did.py:L1129-L1141.

  • Severity P1: The replicate-weight sections describe the wrong methodology surface. docs/methodology/survey-theory.md:L533-L544 presents theta_r = sum((w_r / w_i) * psi_i) as the general IF replicate formula and labels it “Rao-Wu reweighting,” but the registry and implementation support both combined_weights=True and combined_weights=False. The dispatch text in docs/methodology/survey-theory.md:L632-L641 is also too broad: ImputationDiD and TwoStageDiD currently use compute_replicate_refit_variance(), not IF reweighting. Impact: the document now misstates both the supported replicate-weight math and which estimators actually use the IF shortcut. Concrete fix: rename this as generic replicate-weight rescaling, document both combined_weights cases, and explicitly carve out the refit-based replicate paths for ImputationDiD and TwoStageDiD. Refs: docs/methodology/REGISTRY.md:L2376-L2385, diff_diff/survey.py:L1707-L1742, diff_diff/imputation.py:L588-L674, diff_diff/two_stage.py:L572-L688.

  • Severity P2: The bootstrap section lists TripleDifference as supporting PSU-level multiplier bootstrap under survey designs, but the library’s support matrix says survey bootstrap is unavailable for that estimator. Impact: readers will infer an unsupported inference path exists. Concrete fix: remove TripleDifference from the multiplier-bootstrap list in the new doc, or add/ship that support and update the central support matrix consistently. Refs: docs/methodology/survey-theory.md:L647-L650, docs/choosing_estimator.rst:L636-L645, diff_diff/triple_diff.py:L352-L405.

  • Severity P3: The TROP paragraph says Rao-Wu survey bootstrap “re-runs the full estimator for each replicate,” but the registry already documents the implemented deviation: the code reuses precomputed tau_it and only reweights ATT aggregation, which is mathematically equivalent. Because that deviation is documented in REGISTRY.md, this is informational rather than blocking. Impact: minor doc drift against the registry. Concrete fix: mirror the registry note or link to it. Refs: docs/methodology/survey-theory.md:L325-L331, docs/methodology/REGISTRY.md:L2355-L2359.

Code Quality

  • No findings in scope; this PR does not modify runtime code.

Performance

  • No findings in scope.

Maintainability

Tech Debt

  • Severity P3: docs/survey-roadmap.md marks Phase 10a complete before the theory doc is fully aligned with the current implementation. Impact: the roadmap overstates completion. Concrete fix: keep the checkmark only if the documentation fixes above land in this PR; otherwise revert it for now. Refs: docs/survey-roadmap.md:L115-L123.

Security

  • No findings in scope; docs-only change and no secrets/PII are visible in the diff.

Documentation/Tests

  • No additional findings beyond the methodology documentation issues above. No code tests were required for a docs-only PR.

Path to Approval

  1. Correct the scalar TSL/IF equivalence paragraph so it uses residual-scale{EIF} values in compute_survey_vcov() and score-scale psi = w{EIF}/sum(w) in compute_survey_if_variance().
  2. Rewrite the replicate-weight sections to distinguish generic replicate-weight variance from Rao-Wu bootstrap, document both combined_weights modes, and note that ImputationDiD and TwoStageDiD currently use compute_replicate_refit_variance().
  3. Remove TripleDifference from the PSU multiplier-bootstrap list in survey-theory.md or add that support and align docs/choosing_estimator.rst and the codebase consistently.

- P1: Correct scalar TSL/IF equivalence paragraph to distinguish
  residual-scale EIF (compute_survey_vcov input) from score-scale psi
  (compute_survey_if_variance input), matching EfficientDiD's actual usage
- P1: Rewrite replicate-weight section to document both combined_weights
  modes, carve out ImputationDiD/TwoStageDiD refit path, and drop
  incorrect "Rao-Wu reweighting" label
- P2: Remove TripleDifference from PSU multiplier bootstrap list to match
  choosing_estimator.rst compatibility matrix
- P3: Fix TROP description to note precomputed tau optimization per
  REGISTRY.md
- P3: Replace all hard-coded survey.py line numbers with function names

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 5, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 5c4f428fdce2a9842ffda7faa33aced9b868aea9


Overall Assessment

✅ Looks good

Executive Summary

Methodology

  • No findings. Affected methods reviewed: survey TSL variance, replicate-weight variance, and survey-bootstrap dispatch across CallawaySantAnna, ContinuousDiD, EfficientDiD, ImputationDiD, TwoStageDiD, TripleDifference, StaggeredTripleDifference, SunAbraham, SyntheticDiD, and TROP. The changed text now matches the implementation and registry, including the documented TROP deviation at docs/methodology/REGISTRY.md:L2355.

Code Quality

  • No findings. No runtime code changed.

Performance

  • No findings. Docs-only PR.

Maintainability

Tech Debt

Security

  • No findings. Docs-only change and no secrets or PII are visible in scope.

Documentation/Tests

  • No findings. No code-path changes were introduced, so the absence of new tests is acceptable here. Residual risk is limited to future documentation drift, partially mitigated by docs/doc-deps.yaml:L452.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 5, 2026
@igerber igerber merged commit 7777644 into main Apr 5, 2026
@igerber igerber deleted the survey-theory-doc branch April 5, 2026 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant