Tutorial 21: HAD pre-test workflow (composite QUG + Stute + Yatchew) by igerber · Pull Request #409 · igerber/diff-diff

igerber · 2026-05-10T00:18:29Z

Summary

New end-to-end practitioner tutorial for did_had_pretest_workflow that walks through the composite pre-test battery on a panel close in shape to T20's brand campaign, surfaces the structural gap on the two-period (aggregate="overall") path (no Step 2 / parallel pre-trends), and upgrades to the multi-period (aggregate="event_study") path that adds the joint pre-trends Stute and joint homogeneity Stute diagnostics.
The DGP uses Uniform[$0.01K, $50K] for regional spend (vs T20's [$5K, $50K]) — true support strictly positive but very near zero, chosen so QUG fails-to-reject H0: d_lower = 0 in this finite sample. HAD's design="auto" then selects the continuous_at_zero identification path on the QUG outcome (a workflow decision following the test, not a property of the true DGP support — explicitly distinguished in the tutorial prose).
Side panel exercises both yatchew_hr_test null modes side-by-side: null="linearity" (default, paper Theorem 7) vs null="mean_independence" (PR Add yatchew_hr_test(null='mean_independence') mode #397, R-parity with R YatchewTest::yatchew_test(order=0)) on the within-pre-period first-difference paired with post-period dose. Illustrates the stricter null's larger residual variance (sigma2_lin 7.01 vs 6.53) and smaller p-value (0.29 vs 0.49).
Tutorial framing throughout treats fail-to-reject pre-tests as non-rejection evidence under finite-sample power and test specification, NOT as proof that identifying assumptions hold.
Companion drift-test file (tests/test_t21_had_pretest_workflow_drift.py, 15 tests) pinning panel composition, both verdict pivots, structural anchors on both paths, deterministic QUG / Yatchew statistics, and bootstrap p-value tolerance bands per feedback_bootstrap_drift_tests_need_backend_tolerance.

Surfaces touched

docs/tutorials/21_had_pretest_workflow.ipynb (new, 20 cells: 6 code + 14 markdown)
tests/test_t21_had_pretest_workflow_drift.py (new, 15 tests)
docs/tutorials/20_had_brand_campaign.ipynb Section 6 Extensions (forward-pointer to T21)
docs/tutorials/README.md (T21 catalog entry)
CHANGELOG.md [Unreleased] Added entry
TODO.md row 112 (T21 marked done; T22 row remains queued)
docs/doc-deps.yaml had_pretests.py block (T21 tutorial entry)

No source code changes in diff_diff/. T22 weighted/survey HAD tutorial remains queued as a separate notebook PR per project_had_followups.md.

Test plan

pytest tests/test_t21_had_pretest_workflow_drift.py -v (Rust backend, 15/15 expected)
DIFF_DIFF_BACKEND=python pytest tests/test_t21_had_pretest_workflow_drift.py -v (pure-Python backend, 15/15 expected)
pytest --nbmake docs/tutorials/21_had_pretest_workflow.ipynb (notebook executes cleanly)
pytest tests/test_t20_had_brand_campaign_drift.py -v (T20 drift unaffected by Section 6 forward-pointer edit, 13/13 expected)

🤖 Generated with Claude Code

End-to-end practitioner walkthrough for `did_had_pretest_workflow` building on T20's brand-campaign framing. Uses a Design 1 (`continuous_at_zero`) panel variant (Uniform[$0.01K, $50K] vs T20's [$5K, $50K]) so the QUG step fails-to-reject and the verdict text fires the load-bearing "Assumption 7 deferred" pivot for the upgrade-arc narrative. Three sections: - Overall workflow on a two-period collapse: Step 1 + Step 3 only; verdict explicitly flags Step 2 as deferred (single pre-period). - Upgrade to event_study workflow: closes all three testable steps via QUG + joint pre-trends Stute (3 horizons) + joint homogeneity Stute (4 horizons); verdict reads "TWFE admissible under Section 4 assumptions". - Yatchew side panel comparing null="linearity" (default, paper Theorem 7) vs null="mean_independence" (Phase 4 R-parity with R YatchewTest::yatchew_test(order=0)) on the within-pre-period first-difference paired with post-period dose. Companion drift-test file with 15 tests pinning panel composition, both verdict pivots, structural anchors on both paths, deterministic stats, and bootstrap p-value tolerance bands per backend. Updates T20 Section 6 Extensions with a forward-pointer to T21, `docs/tutorials/README.md` with a T21 entry, `docs/doc-deps.yaml` `had_pretests.py` block, CHANGELOG `[Unreleased]`, and the T21/T22 TODO row. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…dependence PR #397 added the `null="mean_independence"` mode; PR #400 was the release-rollup that bundled it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ection vs proof Two methodology framing issues in T21: 1. The DGP `Uniform[$0.01K, $50K]` has support strictly above zero. The tutorial / README / CHANGELOG / drift-test docstrings called it a "true Design 1 (`continuous_at_zero`)" panel, conflating "QUG fails-to-reject d_lower=0 in this finite sample" with "the true DGP support is at zero". Reframe across all surfaces: the DGP has a strictly-positive but very near-zero lower bound chosen so QUG fails-to-reject; HAD's `design="auto"` then selects the `continuous_at_zero` identification path on that QUG outcome (a workflow decision following the test, not a property of the true DGP). 2. The notebook over-described fail-to-reject pre-tests as "formal validation", "conclusive", "closes assumptions", "TWFE admissible without methodological caveat". Soften to "diagnostics fail to reject", "supports but does not prove", "non-rejection evidence under finite-sample power and test specification". Pre-test tutorials should teach the limits of pre-tests, not paper over them. Also extracts a `yatchew_side_panel_inputs` fixture in the drift test to deduplicate post_dose / dy construction across the two side-panel tests. Numerical pins unchanged; all 15 drift tests still pass on both backends; notebook executes cleanly; T20 drift unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ed tutorial Two stale shorthand phrasings inconsistent with the revised methodology framing: - Section 7 Extensions: "single Design 1 panel" → "single panel where QUG led the workflow to select the continuous_at_zero (Design 1) identification path" (matches the corrected Section 2 wording). - `test_event_study_pretrends_fails_to_reject` docstring quoted "close to alpha = 0.05 but conclusive"; the user-facing text now says "warrants scrutiny" - update internal docstring to match. No methodology change, no new pins; all 15 drift tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-10T00:23:04Z

PR Review Report

Overall Assessment

⚠️ Needs changes — one P1 methodology/documentation mismatch.

Executive Summary

No diff_diff/ source implementation changes were introduced.
The HAD overall-path Step 2 deferral is documented in REGISTRY.md, so it is not a defect.
P1: new prose says design="auto" selects continuous_at_zero based on the QUG outcome, but the actual/registry contract uses the independent _detect_design() min/median/modal heuristic.
P2: REGISTRY.md still says the T21 tutorial is queued, while this PR marks it landed elsewhere.
I did not load docs/tutorials/*.ipynb, per the review carve-out.
Verification: AST parse of the new test file passed; pytest is not installed in this environment.

Methodology

Finding 1 — P1

Severity: P1
Location: CHANGELOG.md:L11, tests/test_t21_had_pretest_workflow_drift.py:L16-L21, tests/test_t21_had_pretest_workflow_drift.py:L188-L192; compare diff_diff/had.py:L1951-L2006, docs/methodology/REGISTRY.md:L2528

Impact: The new docs/test prose says the QUG fail-to-reject outcome lets design="auto" select continuous_at_zero. That conflicts with the actual HAD default behavior: _detect_design() selects continuous_at_zero when d.min() == 0 or d.min() < 0.01 * median(abs(d)), then checks modal-min mass. It does not consume the QUG p-value. This blurs a diagnostic pre-test with the estimator’s default design-dispatch rule.

Concrete fix: Reword the changed prose to separate the two facts: QUG fails to reject H0: d_lower = 0, and independently this sample’s near-zero minimum dose satisfies the design="auto" heuristic. If the tutorial wants a QUG-driven practitioner decision, show an explicit HAD(design="continuous_at_zero") choice after the pre-test instead of attributing it to design="auto". If the notebook contains the same wording, update it too. Add a drift assertion for actual HAD(design="auto") design == "continuous_at_zero" / target_parameter == "WAS" if the tutorial claims that estimator path.

Finding 2 — P3 Informational

Severity: P3
Location: docs/methodology/REGISTRY.md:L2489-L2491, docs/methodology/REGISTRY.md:L2542-L2550

Impact: The two-period aggregate="overall" workflow omits Step 2 / Assumption 7 pre-trends and reports that caveat. This is explicitly documented in the registry and is correctly treated as a structural limitation, not a defect.

Concrete fix: None required.

Code Quality

No P0/P1/P2 findings. The new test file parses successfully. Pattern-wide grep for inline inference anti-patterns found no new changed-source occurrence; there are no modified estimator/inference paths.

Performance

No findings. The new drift tests use bootstrap-heavy checks, but that is appropriate for tutorial drift coverage and not a runtime library path.

Maintainability

No additional P0/P1/P2 findings beyond the registry/doc status issue listed under Documentation/Tests.

Tech Debt

No blocking tech-debt issue. The TODO row now tracks T22 as remaining, which is consistent with the PR’s stated deferred work.

Security

No findings. No secrets or security-sensitive code paths were introduced in the reviewed non-notebook diff.

Documentation/Tests

Finding 3 — P2

Severity: P2
Location: docs/methodology/REGISTRY.md:L2509, docs/methodology/REGISTRY.md:L2556; compare TODO.md:L112, docs/doc-deps.yaml:L407-L409, docs/tutorials/README.md:L106-L112

Impact: The methodology registry still says the T21 tutorial is queued/remaining, while this PR marks T21 as landed in TODO, README, changelog, and doc-deps. Since the registry is the project’s methodology source of truth for reviews, stale phase status creates avoidable confusion.

Concrete fix: Update REGISTRY.md to mark T21 as shipped and leave only T22 as remaining, including the drift-test reference if desired.

Audit Confirmation

Sibling-surface mirror audit: checked README, TODO, doc-deps, registry, T20 drift-test pattern, and overall/event-study report surfaces.
Pattern-wide grep: checked the QUG/design wording and inline inference anti-patterns across allowed files.
Reciprocal/symmetry check: checked both aggregate="overall" and aggregate="event_study" workflow semantics.
Transitive workflow dependency audit: no GH Actions or pytest-selection changes in this PR.
Exclusions honored: did not load docs/tutorials/*.ipynb.

Path to Approval

Correct the QUG/design="auto" wording in CHANGELOG.md, the new drift-test docstrings, and any matching T21 notebook prose so it matches _detect_design() and REGISTRY.md.
If the tutorial claims the estimator’s auto-selected design/target, add a drift assertion for the actual HAD fit result’s design and target_parameter.

P1 — Decouple QUG from HAD's `design="auto"` selector across all surfaces. The two are independent rules: QUG is a statistical pre-test on `H0: d_lower = 0`; `_detect_design()` is a min/median heuristic (`continuous_at_zero` fires when `d.min() < 0.01 * median(|d|)`). On T21's panel both checks point to the same identification path but the mechanisms are independent — `_detect_design()` does not consume the QUG p-value. Reword tutorial Section 2 + Section 3, CHANGELOG entry, and drift-test docstrings to reflect this. Add `test_had_design_auto_lands_on_continuous_at_zero`: explicitly fits `HAD(design="auto")` on the two-period panel and asserts `design == "continuous_at_zero"` and `target_parameter == "WAS"`, locking the prose claim independently of the QUG-test pins. P2 — Update REGISTRY.md to mark T21 shipped (PR #409); leave T22 row queued. All 16 drift tests pass on both backends; notebook executes cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-05-10T00:31:09Z

/ai-review

github-actions · 2026-05-10T00:34:27Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 2a6c045be4350bb0f99e96067b302a8d5971b440

PR Review Report

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings. One non-blocking P2 documentation cleanup remains.

Executive Summary

Prior P1 QUG/design="auto" mismatch is resolved in the reviewed non-notebook surfaces.
No diff_diff/ estimator, weighting, variance, SE, or inference code changes were introduced.
HAD workflow methodology matches REGISTRY.md: overall path documents Step 2 deferral; event-study path closes it with joint Stute diagnostics.
P2: a few status/count docs are stale after T21 landed.
I did not load docs/tutorials/*.ipynb, per the explicit exclusion. pytest could not run because this environment lacks pytest/numpy; AST parse of the new test file passed.

Methodology

Finding M1 — P3 Informational

Severity: P3
Location: CHANGELOG.md:L11, tests/test_t21_had_pretest_workflow_drift.py:L16-L26, tests/test_t21_had_pretest_workflow_drift.py:L302-L321; compare diff_diff/had.py:L1951-L2006, docs/methodology/REGISTRY.md:L2528

Impact: The previous P1 is addressed: the prose now separates QUG fail-to-reject from HAD’s independent design="auto" min/median heuristic, and the drift test explicitly fits HAD(design="auto") and asserts continuous_at_zero / WAS.

Concrete fix: None required.

Finding M2 — P3 Informational

Severity: P3
Location: docs/methodology/REGISTRY.md:L2489-L2491, docs/methodology/REGISTRY.md:L2542-L2550, tests/test_t21_had_pretest_workflow_drift.py:L170-L191

Impact: The two-period aggregate="overall" path still omits Step 2 / Assumption 7 pre-trends, but this is explicitly documented and the verdict is drift-tested. This is not a methodology defect.

Concrete fix: None required.

Code Quality

No P0/P1/P2 findings. The new drift test is focused and parses successfully.

Performance

No findings. The added bootstrap-heavy checks are test-only drift coverage, not runtime library code.

Maintainability

No blocking findings. The remaining issue is documentation consistency, listed below.

Tech Debt

No blocking tech-debt issue. T22 remains tracked in TODO.md:L112, which is acceptable deferred work.

Security

No findings. Secret-pattern grep across the changed non-notebook files returned no hits.

Documentation/Tests

Finding D1 — P2

Severity: P2
Location: docs/methodology/REGISTRY.md:L2509, CHANGELOG.md:L11, CHANGELOG.md:L15; compare docs/methodology/REGISTRY.md:L2556, tests/test_t21_had_pretest_workflow_drift.py:L153-L337

Impact: Documentation status is still partially stale: the registry’s older Phase 3 follow-up paragraph still says practitioner_next_steps() and the tutorial are queued for Phase 5, and an older changelog entry still says T21 remains queued. The new changelog entry also says the companion drift file has 15 tests, while the file and registry now show 16.

Concrete fix: Update REGISTRY.md:L2509 and CHANGELOG.md:L15 to say Phase 5 wave 1 and T21 have landed, with only T22 queued. Change CHANGELOG.md:L11 from “15 tests” to “16 tests”.

Audit Confirmation

Sibling-surface mirror audit: checked changelog, registry, TODO, doc-deps, README, HAD _detect_design, and HAD pretest workflow surfaces.
Pattern-wide grep: checked QUG/design wording, T21 queued/status wording, test counts, and inline inference anti-patterns across allowed files.
Reciprocal/symmetry check: checked overall vs event-study HAD workflow semantics.
Transitive workflow deps: no GH Actions workflow or pytest-selection changes in this PR.
Exclusions honored: did not load docs/tutorials/*.ipynb.
Verification: ast.parse() passed for tests/test_t21_had_pretest_workflow_drift.py; full test execution was blocked because pytest and numpy are not installed.

- REGISTRY.md L2509: practitioner_next_steps + T21 tutorial were marked "queued for Phase 5"; both now landed (PR #402 + PR #409). Update to reflect actual status; T22 remains queued. - CHANGELOG.md L11 (T21 entry): drift-test count was "15 tests"; now 16 (after the new test_had_design_auto_lands_on_continuous_at_zero added in R1). - CHANGELOG.md L15 (PR #402 entry, retroactive): said "T21 pretest tutorial and T22 weighted/survey tutorial remain queued"; T21 has since landed in PR #409. Update to reflect that. No methodology change; no test surface changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

P2 — CELL_07 first bullet had a conceptual error in describing the QUG mechanic: "D_(1) is small relative to the gap D_(2)-D_(1)" — actually D_(1) ≈ 0.181 and the gap ≈ 0.047, so D_(1) is 3.86x LARGER than the gap. The reason QUG fails-to-reject is that T = D_(1)/(D_(2)-D_(1)) = 3.86 lands below the critical value 19, NOT because of any "small relative to the gap" relationship. Rewrote to state the test statistic and critical value directly. P3 polish: - CELL_03: "approximately 0.007" → "below 0.01" (avoids numerical drift on a stat that scales with seed; the heuristic threshold itself is what matters). - CELL_07: added a one-line aside reconciling `all_pass=True` with Step 2 deferral on the overall path: `all_pass` aggregates only the steps that ran on each dispatch, so True here means "of the two steps run, neither rejected" — not that Assumption 7 has been cleared. - CELL_09: explained the very-large-negative `T_hr` ≈ -35,000 as a scale artifact (sigma2_diff scales with the squared dose-step gap; on Uniform[0.01, 50] doses with a true slope of 100, adjacent-by-dose units have dy gaps that swamp sigma2_lin). Adds explicit reference forward to the side panel where a different input gives T_hr ≈ 0 as a sanity check. - CELL_17: tightened mean_independence vs linearity framing to "linear fit absorbs any apparent slope (real or sample noise)" — the pre-period has no real signal so the original "absorbs the dose-response signal" wording was off-target on this panel. No methodology change; all 16 drift tests still pass; nbmake clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-05-10T00:56:13Z

/ai-review

github-actions · 2026-05-10T01:00:44Z

🔁 AI review rerun (requested by @igerber)

Head SHA: d9ea86a5494fdd041c8dd8f750ebcf92fd76bce4

PR Review Report

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings. One P2 maintainability/doc-impact mapping issue remains, but it does not block approval.

Executive Summary

Prior stale status/count issue is resolved in the supplied diff: T21 is recorded as landed, T22 remains queued, and the drift-test count is 16.
No diff_diff/ estimator, weighting, variance, SE, or default-behavior code changes are introduced.
HAD pretest methodology aligns with REGISTRY.md: overall path explicitly defers Step 2; event-study path runs joint pre-trends and joint homogeneity Stute diagnostics.
QUG fail-to-reject is now separated from HAD design="auto" resolution, matching _detect_design() and registry guidance.
I did not load docs/tutorials/*.ipynb, per the explicit exclusion. Full pytest could not run because this environment lacks pytest and numpy; AST parse/count found 16 tests.

Methodology

Finding M1 — P3 Informational

Severity: P3
Location: tests/test_t21_had_pretest_workflow_drift.py:L170-L321, docs/methodology/REGISTRY.md:L2489-L2557; compare diff_diff/had_pretests.py:L4413-L4850, diff_diff/had.py:L1951-L2006

Impact: The prior methodology concern is addressed. The tutorial/test surfaces distinguish QUG from HAD(design="auto"), and the overall vs event-study workflow behavior matches the registry and source contracts.

Concrete fix: None required.

Code Quality

No findings. The new test file is focused and parses successfully.

Performance

No findings. The added bootstrap-heavy checks are test-only drift coverage.

Maintainability

Finding MT1 — P2

Severity: P2
Location: docs/doc-deps.yaml:L366-L409, tests/test_t21_had_pretest_workflow_drift.py:L302-L321

Impact: T21 is mapped under diff_diff/had_pretests.py, but the tutorial/drift test also locks HAD(design="auto") and _detect_design() behavior from diff_diff/had.py. A future had.py design-selection change could miss T21 in the manual docs-impact map.

Concrete fix: Also add docs/tutorials/21_had_pretest_workflow.ipynb to the diff_diff/had.py docs block, with a note that it drift-locks design="auto" / _detect_design() behavior via tests/test_t21_had_pretest_workflow_drift.py.

Tech Debt

No blocking findings. T22 remains tracked in TODO.md, which is acceptable deferred tutorial work.

Security

No findings. Secret-pattern grep over the changed non-notebook files found no secrets.

Documentation/Tests

No blocking findings. The prior documentation/count issue is resolved in the supplied diff. Verification was limited by missing local dependencies: python -m pytest tests/test_t21_had_pretest_workflow_drift.py -q could not run because pytest is not installed, and importing the package path was blocked by missing numpy.

Audit Confirmation

Sibling-surface mirror audit: checked changelog, registry, TODO, README, doc-deps, HAD _detect_design, and HAD pretest workflow surfaces.
Pattern-wide grep: checked inline inference/NaN anti-patterns; no new occurrence in the changed test/docs surfaces.
Reciprocal/symmetry check: checked overall vs event-study workflow and QUG vs design="auto" treatment.
Transitive workflow deps: no GH Actions workflow or pytest-selection changes in this PR.
Exclusions honored: did not load docs/tutorials/*.ipynb.

igerber and others added 4 commits May 9, 2026 19:36

Fix T21 PR number reference (#400 → #397) for yatchew_hr_test mean_in…

b7a733f

…dependence PR #397 added the `null="mean_independence"` mode; PR #400 was the release-rollup that bundled it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber and others added 2 commits May 9, 2026 20:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorial 21: HAD pre-test workflow (composite QUG + Stute + Yatchew)#409

Tutorial 21: HAD pre-test workflow (composite QUG + Stute + Yatchew)#409
igerber wants to merge 7 commits intomainfrom
had-tutorial-21-pretest-workflow

igerber commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

igerber commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

igerber commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 10, 2026

Summary

Surfaces touched

Test plan

Uh oh!

github-actions Bot commented May 10, 2026

PR Review Report

Overall Assessment

Executive Summary

Methodology

Finding 1 — P1

Finding 2 — P3 Informational

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Finding 3 — P2

Audit Confirmation

Path to Approval

Uh oh!

igerber commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

PR Review Report

Overall Assessment

Executive Summary

Methodology

Finding M1 — P3 Informational

Finding M2 — P3 Informational

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Finding D1 — P2

Audit Confirmation

Uh oh!

igerber commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

PR Review Report

Overall Assessment

Executive Summary

Methodology

Finding M1 — P3 Informational

Code Quality

Performance

Maintainability

Finding MT1 — P2

Tech Debt

Security

Documentation/Tests

Audit Confirmation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant