Tutorial 21: HAD pre-test workflow (composite QUG + Stute + Yatchew)#409
Tutorial 21: HAD pre-test workflow (composite QUG + Stute + Yatchew)#409
Conversation
End-to-end practitioner walkthrough for `did_had_pretest_workflow` building on T20's brand-campaign framing. Uses a Design 1 (`continuous_at_zero`) panel variant (Uniform[$0.01K, $50K] vs T20's [$5K, $50K]) so the QUG step fails-to-reject and the verdict text fires the load-bearing "Assumption 7 deferred" pivot for the upgrade-arc narrative. Three sections: - Overall workflow on a two-period collapse: Step 1 + Step 3 only; verdict explicitly flags Step 2 as deferred (single pre-period). - Upgrade to event_study workflow: closes all three testable steps via QUG + joint pre-trends Stute (3 horizons) + joint homogeneity Stute (4 horizons); verdict reads "TWFE admissible under Section 4 assumptions". - Yatchew side panel comparing null="linearity" (default, paper Theorem 7) vs null="mean_independence" (Phase 4 R-parity with R YatchewTest::yatchew_test(order=0)) on the within-pre-period first-difference paired with post-period dose. Companion drift-test file with 15 tests pinning panel composition, both verdict pivots, structural anchors on both paths, deterministic stats, and bootstrap p-value tolerance bands per backend. Updates T20 Section 6 Extensions with a forward-pointer to T21, `docs/tutorials/README.md` with a T21 entry, `docs/doc-deps.yaml` `had_pretests.py` block, CHANGELOG `[Unreleased]`, and the T21/T22 TODO row. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ection vs proof Two methodology framing issues in T21: 1. The DGP `Uniform[$0.01K, $50K]` has support strictly above zero. The tutorial / README / CHANGELOG / drift-test docstrings called it a "true Design 1 (`continuous_at_zero`)" panel, conflating "QUG fails-to-reject d_lower=0 in this finite sample" with "the true DGP support is at zero". Reframe across all surfaces: the DGP has a strictly-positive but very near-zero lower bound chosen so QUG fails-to-reject; HAD's `design="auto"` then selects the `continuous_at_zero` identification path on that QUG outcome (a workflow decision following the test, not a property of the true DGP). 2. The notebook over-described fail-to-reject pre-tests as "formal validation", "conclusive", "closes assumptions", "TWFE admissible without methodological caveat". Soften to "diagnostics fail to reject", "supports but does not prove", "non-rejection evidence under finite-sample power and test specification". Pre-test tutorials should teach the limits of pre-tests, not paper over them. Also extracts a `yatchew_side_panel_inputs` fixture in the drift test to deduplicate post_dose / dy construction across the two side-panel tests. Numerical pins unchanged; all 15 drift tests still pass on both backends; notebook executes cleanly; T20 drift unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ed tutorial Two stale shorthand phrasings inconsistent with the revised methodology framing: - Section 7 Extensions: "single Design 1 panel" → "single panel where QUG led the workflow to select the continuous_at_zero (Design 1) identification path" (matches the corrected Section 2 wording). - `test_event_study_pretrends_fails_to_reject` docstring quoted "close to alpha = 0.05 but conclusive"; the user-facing text now says "warrants scrutiny" - update internal docstring to match. No methodology change, no new pins; all 15 drift tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR Review ReportOverall AssessmentExecutive Summary
MethodologyFinding 1 — P1Severity: P1 Impact: The new docs/test prose says the QUG fail-to-reject outcome lets Concrete fix: Reword the changed prose to separate the two facts: QUG fails to reject Finding 2 — P3 InformationalSeverity: P3 Impact: The two-period Concrete fix: None required. Code QualityNo P0/P1/P2 findings. The new test file parses successfully. Pattern-wide grep for inline inference anti-patterns found no new changed-source occurrence; there are no modified estimator/inference paths. PerformanceNo findings. The new drift tests use bootstrap-heavy checks, but that is appropriate for tutorial drift coverage and not a runtime library path. MaintainabilityNo additional P0/P1/P2 findings beyond the registry/doc status issue listed under Documentation/Tests. Tech DebtNo blocking tech-debt issue. The TODO row now tracks T22 as remaining, which is consistent with the PR’s stated deferred work. SecurityNo findings. No secrets or security-sensitive code paths were introduced in the reviewed non-notebook diff. Documentation/TestsFinding 3 — P2Severity: P2 Impact: The methodology registry still says the T21 tutorial is queued/remaining, while this PR marks T21 as landed in TODO, README, changelog, and doc-deps. Since the registry is the project’s methodology source of truth for reviews, stale phase status creates avoidable confusion. Concrete fix: Update Audit Confirmation
Path to Approval
|
P1 — Decouple QUG from HAD's `design="auto"` selector across all surfaces. The two are independent rules: QUG is a statistical pre-test on `H0: d_lower = 0`; `_detect_design()` is a min/median heuristic (`continuous_at_zero` fires when `d.min() < 0.01 * median(|d|)`). On T21's panel both checks point to the same identification path but the mechanisms are independent — `_detect_design()` does not consume the QUG p-value. Reword tutorial Section 2 + Section 3, CHANGELOG entry, and drift-test docstrings to reflect this. Add `test_had_design_auto_lands_on_continuous_at_zero`: explicitly fits `HAD(design="auto")` on the two-period panel and asserts `design == "continuous_at_zero"` and `target_parameter == "WAS"`, locking the prose claim independently of the QUG-test pins. P2 — Update REGISTRY.md to mark T21 shipped (PR #409); leave T22 row queued. All 16 drift tests pass on both backends; notebook executes cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: PR Review ReportOverall Assessment✅ Looks good — no unmitigated P0/P1 findings. One non-blocking P2 documentation cleanup remains. Executive Summary
MethodologyFinding M1 — P3 InformationalSeverity: P3 Impact: The previous P1 is addressed: the prose now separates QUG fail-to-reject from HAD’s independent Concrete fix: None required. Finding M2 — P3 InformationalSeverity: P3 Impact: The two-period Concrete fix: None required. Code QualityNo P0/P1/P2 findings. The new drift test is focused and parses successfully. PerformanceNo findings. The added bootstrap-heavy checks are test-only drift coverage, not runtime library code. MaintainabilityNo blocking findings. The remaining issue is documentation consistency, listed below. Tech DebtNo blocking tech-debt issue. T22 remains tracked in SecurityNo findings. Secret-pattern grep across the changed non-notebook files returned no hits. Documentation/TestsFinding D1 — P2Severity: P2 Impact: Documentation status is still partially stale: the registry’s older Phase 3 follow-up paragraph still says Concrete fix: Update Audit Confirmation
|
- REGISTRY.md L2509: practitioner_next_steps + T21 tutorial were marked "queued for Phase 5"; both now landed (PR #402 + PR #409). Update to reflect actual status; T22 remains queued. - CHANGELOG.md L11 (T21 entry): drift-test count was "15 tests"; now 16 (after the new test_had_design_auto_lands_on_continuous_at_zero added in R1). - CHANGELOG.md L15 (PR #402 entry, retroactive): said "T21 pretest tutorial and T22 weighted/survey tutorial remain queued"; T21 has since landed in PR #409. Update to reflect that. No methodology change; no test surface changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P2 — CELL_07 first bullet had a conceptual error in describing the QUG mechanic: "D_(1) is small relative to the gap D_(2)-D_(1)" — actually D_(1) ≈ 0.181 and the gap ≈ 0.047, so D_(1) is 3.86x LARGER than the gap. The reason QUG fails-to-reject is that T = D_(1)/(D_(2)-D_(1)) = 3.86 lands below the critical value 19, NOT because of any "small relative to the gap" relationship. Rewrote to state the test statistic and critical value directly. P3 polish: - CELL_03: "approximately 0.007" → "below 0.01" (avoids numerical drift on a stat that scales with seed; the heuristic threshold itself is what matters). - CELL_07: added a one-line aside reconciling `all_pass=True` with Step 2 deferral on the overall path: `all_pass` aggregates only the steps that ran on each dispatch, so True here means "of the two steps run, neither rejected" — not that Assumption 7 has been cleared. - CELL_09: explained the very-large-negative `T_hr` ≈ -35,000 as a scale artifact (sigma2_diff scales with the squared dose-step gap; on Uniform[0.01, 50] doses with a true slope of 100, adjacent-by-dose units have dy gaps that swamp sigma2_lin). Adds explicit reference forward to the side panel where a different input gives T_hr ≈ 0 as a sanity check. - CELL_17: tightened mean_independence vs linearity framing to "linear fit absorbs any apparent slope (real or sample noise)" — the pre-period has no real signal so the original "absorbs the dose-response signal" wording was off-target on this panel. No methodology change; all 16 drift tests still pass; nbmake clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: PR Review ReportOverall Assessment✅ Looks good — no unmitigated P0/P1 findings. One P2 maintainability/doc-impact mapping issue remains, but it does not block approval. Executive Summary
MethodologyFinding M1 — P3 InformationalSeverity: P3 Impact: The prior methodology concern is addressed. The tutorial/test surfaces distinguish QUG from Concrete fix: None required. Code QualityNo findings. The new test file is focused and parses successfully. PerformanceNo findings. The added bootstrap-heavy checks are test-only drift coverage. MaintainabilityFinding MT1 — P2Severity: P2 Impact: T21 is mapped under Concrete fix: Also add Tech DebtNo blocking findings. T22 remains tracked in SecurityNo findings. Secret-pattern grep over the changed non-notebook files found no secrets. Documentation/TestsNo blocking findings. The prior documentation/count issue is resolved in the supplied diff. Verification was limited by missing local dependencies: Audit Confirmation
|
Summary
did_had_pretest_workflowthat walks through the composite pre-test battery on a panel close in shape to T20's brand campaign, surfaces the structural gap on the two-period (aggregate="overall") path (no Step 2 / parallel pre-trends), and upgrades to the multi-period (aggregate="event_study") path that adds the joint pre-trends Stute and joint homogeneity Stute diagnostics.Uniform[$0.01K, $50K]for regional spend (vs T20's[$5K, $50K]) — true support strictly positive but very near zero, chosen so QUG fails-to-rejectH0: d_lower = 0in this finite sample. HAD'sdesign="auto"then selects thecontinuous_at_zeroidentification path on the QUG outcome (a workflow decision following the test, not a property of the true DGP support — explicitly distinguished in the tutorial prose).yatchew_hr_testnull modes side-by-side:null="linearity"(default, paper Theorem 7) vsnull="mean_independence"(PR Add yatchew_hr_test(null='mean_independence') mode #397, R-parity with RYatchewTest::yatchew_test(order=0)) on the within-pre-period first-difference paired with post-period dose. Illustrates the stricter null's larger residual variance (sigma2_lin7.01 vs 6.53) and smaller p-value (0.29 vs 0.49).tests/test_t21_had_pretest_workflow_drift.py, 15 tests) pinning panel composition, both verdict pivots, structural anchors on both paths, deterministic QUG / Yatchew statistics, and bootstrap p-value tolerance bands perfeedback_bootstrap_drift_tests_need_backend_tolerance.Surfaces touched
docs/tutorials/21_had_pretest_workflow.ipynb(new, 20 cells: 6 code + 14 markdown)tests/test_t21_had_pretest_workflow_drift.py(new, 15 tests)docs/tutorials/20_had_brand_campaign.ipynbSection 6 Extensions (forward-pointer to T21)docs/tutorials/README.md(T21 catalog entry)CHANGELOG.md[Unreleased]Added entryTODO.mdrow 112 (T21 marked done; T22 row remains queued)docs/doc-deps.yamlhad_pretests.pyblock (T21 tutorial entry)No source code changes in
diff_diff/. T22 weighted/survey HAD tutorial remains queued as a separate notebook PR perproject_had_followups.md.Test plan
pytest tests/test_t21_had_pretest_workflow_drift.py -v(Rust backend, 15/15 expected)DIFF_DIFF_BACKEND=python pytest tests/test_t21_had_pretest_workflow_drift.py -v(pure-Python backend, 15/15 expected)pytest --nbmake docs/tutorials/21_had_pretest_workflow.ipynb(notebook executes cleanly)pytest tests/test_t20_had_brand_campaign_drift.py -v(T20 drift unaffected by Section 6 forward-pointer edit, 13/13 expected)🤖 Generated with Claude Code