Skip to content

Tutorial 21: HAD pre-test workflow (composite QUG + Stute + Yatchew)#409

Open
igerber wants to merge 7 commits intomainfrom
had-tutorial-21-pretest-workflow
Open

Tutorial 21: HAD pre-test workflow (composite QUG + Stute + Yatchew)#409
igerber wants to merge 7 commits intomainfrom
had-tutorial-21-pretest-workflow

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 10, 2026

Summary

  • New end-to-end practitioner tutorial for did_had_pretest_workflow that walks through the composite pre-test battery on a panel close in shape to T20's brand campaign, surfaces the structural gap on the two-period (aggregate="overall") path (no Step 2 / parallel pre-trends), and upgrades to the multi-period (aggregate="event_study") path that adds the joint pre-trends Stute and joint homogeneity Stute diagnostics.
  • The DGP uses Uniform[$0.01K, $50K] for regional spend (vs T20's [$5K, $50K]) — true support strictly positive but very near zero, chosen so QUG fails-to-reject H0: d_lower = 0 in this finite sample. HAD's design="auto" then selects the continuous_at_zero identification path on the QUG outcome (a workflow decision following the test, not a property of the true DGP support — explicitly distinguished in the tutorial prose).
  • Side panel exercises both yatchew_hr_test null modes side-by-side: null="linearity" (default, paper Theorem 7) vs null="mean_independence" (PR Add yatchew_hr_test(null='mean_independence') mode #397, R-parity with R YatchewTest::yatchew_test(order=0)) on the within-pre-period first-difference paired with post-period dose. Illustrates the stricter null's larger residual variance (sigma2_lin 7.01 vs 6.53) and smaller p-value (0.29 vs 0.49).
  • Tutorial framing throughout treats fail-to-reject pre-tests as non-rejection evidence under finite-sample power and test specification, NOT as proof that identifying assumptions hold.
  • Companion drift-test file (tests/test_t21_had_pretest_workflow_drift.py, 15 tests) pinning panel composition, both verdict pivots, structural anchors on both paths, deterministic QUG / Yatchew statistics, and bootstrap p-value tolerance bands per feedback_bootstrap_drift_tests_need_backend_tolerance.

Surfaces touched

  • docs/tutorials/21_had_pretest_workflow.ipynb (new, 20 cells: 6 code + 14 markdown)
  • tests/test_t21_had_pretest_workflow_drift.py (new, 15 tests)
  • docs/tutorials/20_had_brand_campaign.ipynb Section 6 Extensions (forward-pointer to T21)
  • docs/tutorials/README.md (T21 catalog entry)
  • CHANGELOG.md [Unreleased] Added entry
  • TODO.md row 112 (T21 marked done; T22 row remains queued)
  • docs/doc-deps.yaml had_pretests.py block (T21 tutorial entry)

No source code changes in diff_diff/. T22 weighted/survey HAD tutorial remains queued as a separate notebook PR per project_had_followups.md.

Test plan

  • pytest tests/test_t21_had_pretest_workflow_drift.py -v (Rust backend, 15/15 expected)
  • DIFF_DIFF_BACKEND=python pytest tests/test_t21_had_pretest_workflow_drift.py -v (pure-Python backend, 15/15 expected)
  • pytest --nbmake docs/tutorials/21_had_pretest_workflow.ipynb (notebook executes cleanly)
  • pytest tests/test_t20_had_brand_campaign_drift.py -v (T20 drift unaffected by Section 6 forward-pointer edit, 13/13 expected)

🤖 Generated with Claude Code

igerber and others added 4 commits May 9, 2026 19:36
End-to-end practitioner walkthrough for `did_had_pretest_workflow` building
on T20's brand-campaign framing. Uses a Design 1 (`continuous_at_zero`)
panel variant (Uniform[$0.01K, $50K] vs T20's [$5K, $50K]) so the QUG step
fails-to-reject and the verdict text fires the load-bearing
"Assumption 7 deferred" pivot for the upgrade-arc narrative.

Three sections:
- Overall workflow on a two-period collapse: Step 1 + Step 3 only;
  verdict explicitly flags Step 2 as deferred (single pre-period).
- Upgrade to event_study workflow: closes all three testable steps
  via QUG + joint pre-trends Stute (3 horizons) + joint homogeneity
  Stute (4 horizons); verdict reads "TWFE admissible under Section 4
  assumptions".
- Yatchew side panel comparing null="linearity" (default, paper Theorem 7)
  vs null="mean_independence" (Phase 4 R-parity with R
  YatchewTest::yatchew_test(order=0)) on the within-pre-period
  first-difference paired with post-period dose.

Companion drift-test file with 15 tests pinning panel composition, both
verdict pivots, structural anchors on both paths, deterministic stats,
and bootstrap p-value tolerance bands per backend.

Updates T20 Section 6 Extensions with a forward-pointer to T21,
`docs/tutorials/README.md` with a T21 entry, `docs/doc-deps.yaml`
`had_pretests.py` block, CHANGELOG `[Unreleased]`, and the T21/T22
TODO row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dependence

PR #397 added the `null="mean_independence"` mode; PR #400 was the
release-rollup that bundled it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ection vs proof

Two methodology framing issues in T21:

1. The DGP `Uniform[$0.01K, $50K]` has support strictly above zero. The
   tutorial / README / CHANGELOG / drift-test docstrings called it a "true
   Design 1 (`continuous_at_zero`)" panel, conflating "QUG fails-to-reject
   d_lower=0 in this finite sample" with "the true DGP support is at zero".
   Reframe across all surfaces: the DGP has a strictly-positive but very
   near-zero lower bound chosen so QUG fails-to-reject; HAD's `design="auto"`
   then selects the `continuous_at_zero` identification path on that QUG
   outcome (a workflow decision following the test, not a property of the
   true DGP).

2. The notebook over-described fail-to-reject pre-tests as "formal validation",
   "conclusive", "closes assumptions", "TWFE admissible without methodological
   caveat". Soften to "diagnostics fail to reject", "supports but does not
   prove", "non-rejection evidence under finite-sample power and test
   specification". Pre-test tutorials should teach the limits of pre-tests,
   not paper over them.

Also extracts a `yatchew_side_panel_inputs` fixture in the drift test to
deduplicate post_dose / dy construction across the two side-panel tests.

Numerical pins unchanged; all 15 drift tests still pass on both backends;
notebook executes cleanly; T20 drift unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ed tutorial

Two stale shorthand phrasings inconsistent with the revised methodology framing:

- Section 7 Extensions: "single Design 1 panel" → "single panel where QUG
  led the workflow to select the continuous_at_zero (Design 1) identification
  path" (matches the corrected Section 2 wording).
- `test_event_study_pretrends_fails_to_reject` docstring quoted "close to
  alpha = 0.05 but conclusive"; the user-facing text now says
  "warrants scrutiny" - update internal docstring to match.

No methodology change, no new pins; all 15 drift tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

PR Review Report

Overall Assessment

⚠️ Needs changes — one P1 methodology/documentation mismatch.

Executive Summary

  • No diff_diff/ source implementation changes were introduced.
  • The HAD overall-path Step 2 deferral is documented in REGISTRY.md, so it is not a defect.
  • P1: new prose says design="auto" selects continuous_at_zero based on the QUG outcome, but the actual/registry contract uses the independent _detect_design() min/median/modal heuristic.
  • P2: REGISTRY.md still says the T21 tutorial is queued, while this PR marks it landed elsewhere.
  • I did not load docs/tutorials/*.ipynb, per the review carve-out.
  • Verification: AST parse of the new test file passed; pytest is not installed in this environment.

Methodology

Finding 1 — P1

Severity: P1
Location: CHANGELOG.md:L11, tests/test_t21_had_pretest_workflow_drift.py:L16-L21, tests/test_t21_had_pretest_workflow_drift.py:L188-L192; compare diff_diff/had.py:L1951-L2006, docs/methodology/REGISTRY.md:L2528

Impact: The new docs/test prose says the QUG fail-to-reject outcome lets design="auto" select continuous_at_zero. That conflicts with the actual HAD default behavior: _detect_design() selects continuous_at_zero when d.min() == 0 or d.min() < 0.01 * median(abs(d)), then checks modal-min mass. It does not consume the QUG p-value. This blurs a diagnostic pre-test with the estimator’s default design-dispatch rule.

Concrete fix: Reword the changed prose to separate the two facts: QUG fails to reject H0: d_lower = 0, and independently this sample’s near-zero minimum dose satisfies the design="auto" heuristic. If the tutorial wants a QUG-driven practitioner decision, show an explicit HAD(design="continuous_at_zero") choice after the pre-test instead of attributing it to design="auto". If the notebook contains the same wording, update it too. Add a drift assertion for actual HAD(design="auto") design == "continuous_at_zero" / target_parameter == "WAS" if the tutorial claims that estimator path.

Finding 2 — P3 Informational

Severity: P3
Location: docs/methodology/REGISTRY.md:L2489-L2491, docs/methodology/REGISTRY.md:L2542-L2550

Impact: The two-period aggregate="overall" workflow omits Step 2 / Assumption 7 pre-trends and reports that caveat. This is explicitly documented in the registry and is correctly treated as a structural limitation, not a defect.

Concrete fix: None required.

Code Quality

No P0/P1/P2 findings. The new test file parses successfully. Pattern-wide grep for inline inference anti-patterns found no new changed-source occurrence; there are no modified estimator/inference paths.

Performance

No findings. The new drift tests use bootstrap-heavy checks, but that is appropriate for tutorial drift coverage and not a runtime library path.

Maintainability

No additional P0/P1/P2 findings beyond the registry/doc status issue listed under Documentation/Tests.

Tech Debt

No blocking tech-debt issue. The TODO row now tracks T22 as remaining, which is consistent with the PR’s stated deferred work.

Security

No findings. No secrets or security-sensitive code paths were introduced in the reviewed non-notebook diff.

Documentation/Tests

Finding 3 — P2

Severity: P2
Location: docs/methodology/REGISTRY.md:L2509, docs/methodology/REGISTRY.md:L2556; compare TODO.md:L112, docs/doc-deps.yaml:L407-L409, docs/tutorials/README.md:L106-L112

Impact: The methodology registry still says the T21 tutorial is queued/remaining, while this PR marks T21 as landed in TODO, README, changelog, and doc-deps. Since the registry is the project’s methodology source of truth for reviews, stale phase status creates avoidable confusion.

Concrete fix: Update REGISTRY.md to mark T21 as shipped and leave only T22 as remaining, including the drift-test reference if desired.

Audit Confirmation

  • Sibling-surface mirror audit: checked README, TODO, doc-deps, registry, T20 drift-test pattern, and overall/event-study report surfaces.
  • Pattern-wide grep: checked the QUG/design wording and inline inference anti-patterns across allowed files.
  • Reciprocal/symmetry check: checked both aggregate="overall" and aggregate="event_study" workflow semantics.
  • Transitive workflow dependency audit: no GH Actions or pytest-selection changes in this PR.
  • Exclusions honored: did not load docs/tutorials/*.ipynb.

Path to Approval

  1. Correct the QUG/design="auto" wording in CHANGELOG.md, the new drift-test docstrings, and any matching T21 notebook prose so it matches _detect_design() and REGISTRY.md.
  2. If the tutorial claims the estimator’s auto-selected design/target, add a drift assertion for the actual HAD fit result’s design and target_parameter.

P1 — Decouple QUG from HAD's `design="auto"` selector across all surfaces.
The two are independent rules: QUG is a statistical pre-test on
`H0: d_lower = 0`; `_detect_design()` is a min/median heuristic
(`continuous_at_zero` fires when `d.min() < 0.01 * median(|d|)`). On
T21's panel both checks point to the same identification path but the
mechanisms are independent — `_detect_design()` does not consume the
QUG p-value. Reword tutorial Section 2 + Section 3, CHANGELOG entry,
and drift-test docstrings to reflect this.

Add `test_had_design_auto_lands_on_continuous_at_zero`: explicitly
fits `HAD(design="auto")` on the two-period panel and asserts
`design == "continuous_at_zero"` and `target_parameter == "WAS"`,
locking the prose claim independently of the QUG-test pins.

P2 — Update REGISTRY.md to mark T21 shipped (PR #409); leave T22 row
queued.

All 16 drift tests pass on both backends; notebook executes cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 10, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 2a6c045be4350bb0f99e96067b302a8d5971b440


PR Review Report

Overall Assessment

Looks good — no unmitigated P0/P1 findings. One non-blocking P2 documentation cleanup remains.

Executive Summary

  • Prior P1 QUG/design="auto" mismatch is resolved in the reviewed non-notebook surfaces.
  • No diff_diff/ estimator, weighting, variance, SE, or inference code changes were introduced.
  • HAD workflow methodology matches REGISTRY.md: overall path documents Step 2 deferral; event-study path closes it with joint Stute diagnostics.
  • P2: a few status/count docs are stale after T21 landed.
  • I did not load docs/tutorials/*.ipynb, per the explicit exclusion. pytest could not run because this environment lacks pytest/numpy; AST parse of the new test file passed.

Methodology

Finding M1 — P3 Informational

Severity: P3
Location: CHANGELOG.md:L11, tests/test_t21_had_pretest_workflow_drift.py:L16-L26, tests/test_t21_had_pretest_workflow_drift.py:L302-L321; compare diff_diff/had.py:L1951-L2006, docs/methodology/REGISTRY.md:L2528

Impact: The previous P1 is addressed: the prose now separates QUG fail-to-reject from HAD’s independent design="auto" min/median heuristic, and the drift test explicitly fits HAD(design="auto") and asserts continuous_at_zero / WAS.

Concrete fix: None required.

Finding M2 — P3 Informational

Severity: P3
Location: docs/methodology/REGISTRY.md:L2489-L2491, docs/methodology/REGISTRY.md:L2542-L2550, tests/test_t21_had_pretest_workflow_drift.py:L170-L191

Impact: The two-period aggregate="overall" path still omits Step 2 / Assumption 7 pre-trends, but this is explicitly documented and the verdict is drift-tested. This is not a methodology defect.

Concrete fix: None required.

Code Quality

No P0/P1/P2 findings. The new drift test is focused and parses successfully.

Performance

No findings. The added bootstrap-heavy checks are test-only drift coverage, not runtime library code.

Maintainability

No blocking findings. The remaining issue is documentation consistency, listed below.

Tech Debt

No blocking tech-debt issue. T22 remains tracked in TODO.md:L112, which is acceptable deferred work.

Security

No findings. Secret-pattern grep across the changed non-notebook files returned no hits.

Documentation/Tests

Finding D1 — P2

Severity: P2
Location: docs/methodology/REGISTRY.md:L2509, CHANGELOG.md:L11, CHANGELOG.md:L15; compare docs/methodology/REGISTRY.md:L2556, tests/test_t21_had_pretest_workflow_drift.py:L153-L337

Impact: Documentation status is still partially stale: the registry’s older Phase 3 follow-up paragraph still says practitioner_next_steps() and the tutorial are queued for Phase 5, and an older changelog entry still says T21 remains queued. The new changelog entry also says the companion drift file has 15 tests, while the file and registry now show 16.

Concrete fix: Update REGISTRY.md:L2509 and CHANGELOG.md:L15 to say Phase 5 wave 1 and T21 have landed, with only T22 queued. Change CHANGELOG.md:L11 from “15 tests” to “16 tests”.

Audit Confirmation

  • Sibling-surface mirror audit: checked changelog, registry, TODO, doc-deps, README, HAD _detect_design, and HAD pretest workflow surfaces.
  • Pattern-wide grep: checked QUG/design wording, T21 queued/status wording, test counts, and inline inference anti-patterns across allowed files.
  • Reciprocal/symmetry check: checked overall vs event-study HAD workflow semantics.
  • Transitive workflow deps: no GH Actions workflow or pytest-selection changes in this PR.
  • Exclusions honored: did not load docs/tutorials/*.ipynb.
  • Verification: ast.parse() passed for tests/test_t21_had_pretest_workflow_drift.py; full test execution was blocked because pytest and numpy are not installed.

igerber and others added 2 commits May 9, 2026 20:49
- REGISTRY.md L2509: practitioner_next_steps + T21 tutorial were marked
  "queued for Phase 5"; both now landed (PR #402 + PR #409). Update to
  reflect actual status; T22 remains queued.
- CHANGELOG.md L11 (T21 entry): drift-test count was "15 tests"; now 16
  (after the new test_had_design_auto_lands_on_continuous_at_zero added
  in R1).
- CHANGELOG.md L15 (PR #402 entry, retroactive): said "T21 pretest
  tutorial and T22 weighted/survey tutorial remain queued"; T21 has
  since landed in PR #409. Update to reflect that.

No methodology change; no test surface changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P2 — CELL_07 first bullet had a conceptual error in describing the QUG
mechanic: "D_(1) is small relative to the gap D_(2)-D_(1)" — actually
D_(1) ≈ 0.181 and the gap ≈ 0.047, so D_(1) is 3.86x LARGER than the
gap. The reason QUG fails-to-reject is that T = D_(1)/(D_(2)-D_(1)) =
3.86 lands below the critical value 19, NOT because of any "small
relative to the gap" relationship. Rewrote to state the test statistic
and critical value directly.

P3 polish:
- CELL_03: "approximately 0.007" → "below 0.01" (avoids numerical drift
  on a stat that scales with seed; the heuristic threshold itself is
  what matters).
- CELL_07: added a one-line aside reconciling `all_pass=True` with
  Step 2 deferral on the overall path: `all_pass` aggregates only the
  steps that ran on each dispatch, so True here means "of the two
  steps run, neither rejected" — not that Assumption 7 has been cleared.
- CELL_09: explained the very-large-negative `T_hr` ≈ -35,000 as a
  scale artifact (sigma2_diff scales with the squared dose-step gap;
  on Uniform[0.01, 50] doses with a true slope of 100, adjacent-by-dose
  units have dy gaps that swamp sigma2_lin). Adds explicit reference
  forward to the side panel where a different input gives T_hr ≈ 0
  as a sanity check.
- CELL_17: tightened mean_independence vs linearity framing to "linear
  fit absorbs any apparent slope (real or sample noise)" — the
  pre-period has no real signal so the original "absorbs the
  dose-response signal" wording was off-target on this panel.

No methodology change; all 16 drift tests still pass; nbmake clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 10, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: d9ea86a5494fdd041c8dd8f750ebcf92fd76bce4


PR Review Report

Overall Assessment

Looks good — no unmitigated P0/P1 findings. One P2 maintainability/doc-impact mapping issue remains, but it does not block approval.

Executive Summary

  • Prior stale status/count issue is resolved in the supplied diff: T21 is recorded as landed, T22 remains queued, and the drift-test count is 16.
  • No diff_diff/ estimator, weighting, variance, SE, or default-behavior code changes are introduced.
  • HAD pretest methodology aligns with REGISTRY.md: overall path explicitly defers Step 2; event-study path runs joint pre-trends and joint homogeneity Stute diagnostics.
  • QUG fail-to-reject is now separated from HAD design="auto" resolution, matching _detect_design() and registry guidance.
  • I did not load docs/tutorials/*.ipynb, per the explicit exclusion. Full pytest could not run because this environment lacks pytest and numpy; AST parse/count found 16 tests.

Methodology

Finding M1 — P3 Informational

Severity: P3
Location: tests/test_t21_had_pretest_workflow_drift.py:L170-L321, docs/methodology/REGISTRY.md:L2489-L2557; compare diff_diff/had_pretests.py:L4413-L4850, diff_diff/had.py:L1951-L2006

Impact: The prior methodology concern is addressed. The tutorial/test surfaces distinguish QUG from HAD(design="auto"), and the overall vs event-study workflow behavior matches the registry and source contracts.

Concrete fix: None required.

Code Quality

No findings. The new test file is focused and parses successfully.

Performance

No findings. The added bootstrap-heavy checks are test-only drift coverage.

Maintainability

Finding MT1 — P2

Severity: P2
Location: docs/doc-deps.yaml:L366-L409, tests/test_t21_had_pretest_workflow_drift.py:L302-L321

Impact: T21 is mapped under diff_diff/had_pretests.py, but the tutorial/drift test also locks HAD(design="auto") and _detect_design() behavior from diff_diff/had.py. A future had.py design-selection change could miss T21 in the manual docs-impact map.

Concrete fix: Also add docs/tutorials/21_had_pretest_workflow.ipynb to the diff_diff/had.py docs block, with a note that it drift-locks design="auto" / _detect_design() behavior via tests/test_t21_had_pretest_workflow_drift.py.

Tech Debt

No blocking findings. T22 remains tracked in TODO.md, which is acceptable deferred tutorial work.

Security

No findings. Secret-pattern grep over the changed non-notebook files found no secrets.

Documentation/Tests

No blocking findings. The prior documentation/count issue is resolved in the supplied diff. Verification was limited by missing local dependencies: python -m pytest tests/test_t21_had_pretest_workflow_drift.py -q could not run because pytest is not installed, and importing the package path was blocked by missing numpy.

Audit Confirmation

  • Sibling-surface mirror audit: checked changelog, registry, TODO, README, doc-deps, HAD _detect_design, and HAD pretest workflow surfaces.
  • Pattern-wide grep: checked inline inference/NaN anti-patterns; no new occurrence in the changed test/docs surfaces.
  • Reciprocal/symmetry check: checked overall vs event-study workflow and QUG vs design="auto" treatment.
  • Transitive workflow deps: no GH Actions workflow or pytest-selection changes in this PR.
  • Exclusions honored: did not load docs/tutorials/*.ipynb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant