igerber · igerber · May 10, 2026 · May 10, 2026 · May 10, 2026 · May 11, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
+- **`ChaisemartinDHaultfoeuille.by_path` and `paths_of_interest` now compose with `heterogeneity="<col>"`** (Web Appendix Section 1.5, Lemma 7). Per-path heterogeneity coefficient is computed by re-running the Lemma 7 regression on each path-restricted switcher subsample. The path filter (`path_groups: Optional[Set[int]]`) restricts eligibility to switchers ON path `p` inside the inner regression; the variance machinery (standard WLS vcov for non-survey, cell-period IF allocator for Binder TSL, group-level allocator for Rao-Wu replicate) is unchanged from the global heterogeneity path. Cohort dummies in the design matrix absorb baseline by construction, so multi-baseline switcher panels do not produce R-divergence (no parallel `UserWarning` like `controls` / `trends_linear`). Surfaces on `results.path_heterogeneity_effects` keyed `{path: {l: {beta, se, t_stat, p_value, conf_int, n_obs}}}` and on `results.to_dataframe(level="by_path")` via new always-present `het_*` columns (`het_beta`, `het_se`, `het_t_stat`, `het_p_value`, `het_conf_int_lower`, `het_conf_int_upper`), populated for positive-horizon rows when `heterogeneity` is set and NaN otherwise (mirrors the `cband_*` and `cumulated_*` always-present convention). Composes with `survey_design` (analytical Binder TSL + replicate-weight bootstrap) via the existing PR #408 IF allocator path; under replicate weights, every per-(path, horizon) fit appends `n_valid` to the shared `_replicate_n_valid_list` accumulator and the final `_effective_df_survey` recomputation reflects all per-path appends. R parity verified against `did_multiplegt_dyn(..., by_path=3, predict_het=list("het_x", c(1,2,3)))` on the new `multi_path_reversible_by_path_predict_het` golden-value scenario; a sibling global anchor `multi_path_reversible_predict_het` introduces the FIRST `predict_het` R-parity baseline in the repo (no prior `TestDCDHDynRParityHeterogeneity` existed). Both R calls use `dont_drop_larger_lower=TRUE` to match the Python `drop_larger_lower=False` requirement and to provide cohort variation at every horizon under reversal paths. Per-path SE matches global SE bit-exactly on a single-path panel (telescope invariant, `atol=rtol=1e-14`). Multiplier bootstrap (`n_bootstrap > 0`) under `by_path + heterogeneity + survey_design` inherits the existing per-path multiplier-bootstrap-survey gate from PR #408. The `NotImplementedError` gate at `chaisemartin_dhaultfoeuille.py:1230-1234` is removed; `heterogeneity` precondition mutex with `controls` / `trends_linear` / `trends_nonparam` stays in place. Cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathHeterogeneity` (~13 tests across gate dispatch, behavior, single-path telescope, zero-signal anti-regression, multi-baseline UserWarning anti-regression, DataFrame integration, edge cases) + `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneity` (global anchor) + `::TestDCDHDynRParityByPathHeterogeneity` (per-path). See `docs/methodology/REGISTRY.md` §`ChaisemartinDHaultfoeuille` `Note (Phase 3 by_path ...)` → "Per-path heterogeneity testing" for the full contract.
 - **Tutorial 21: HAD Pre-test Workflow** (`docs/tutorials/21_had_pretest_workflow.ipynb`) — composite pre-test walkthrough for `HeterogeneousAdoptionDiD` building on Tutorial 20's brand-campaign framing. Uses a 60-DMA × 8-week panel close in shape to T20's but with the dose distribution drawn from `Uniform[$0.01K, $50K]` (vs T20's `[$5K, $50K]`); the true support is strictly positive but very near zero, chosen so the QUG step in `did_had_pretest_workflow` fails-to-reject `H0: d_lower = 0` in this finite sample and the verdict text fires the load-bearing "Assumption 7 deferred" pivot for the upgrade-arc narrative. (HAD's `design="auto"` selector — a separate min/median heuristic at `had.py::_detect_design`, NOT the QUG p-value — independently lands on the `continuous_at_zero` identification path with target `WAS` on this panel because `d.min() < 0.01 * median(|d|)`. The QUG test and the design selector are independent rules that point to the same identification path here.) Walks through three surfaces: (a) `did_had_pretest_workflow(aggregate="overall")` on a two-period collapse, where the verdict explicitly flags Step 2 (Assumption 7 pre-trends) as not run because a single pre-period structurally cannot support a pre-trends test, and the structural fields `pretrends_joint` / `homogeneity_joint` are both `None`; (b) `did_had_pretest_workflow(aggregate="event_study")` on the full multi-period panel, where the verdict reads "TWFE admissible under Section 4 assumptions" because all three testable diagnostics (QUG + joint pre-trends Stute over 3 horizons + joint homogeneity Stute over 4 horizons) fail-to-reject — non-rejection evidence under finite-sample power and test specification, not proof that the identifying assumptions hold; and (c) a side panel exercising both `yatchew_hr_test` null modes — `null="linearity"` (default, paper Theorem 7) vs `null="mean_independence"` (Phase 4 R-parity with R `YatchewTest::yatchew_test(order=0)`) — on the within-pre-period first-difference paired with post-period dose, illustrating the stricter null's larger residual variance (`sigma2_lin` 7.01 vs 6.53) and smaller p-value (0.29 vs 0.49). Companion drift-test file `tests/test_t21_had_pretest_workflow_drift.py` (16 tests pinning panel composition, both verdict pivots, structural anchors on both paths, deterministic QUG / Yatchew statistics, bootstrap p-value tolerance bands per `feedback_bootstrap_drift_tests_need_backend_tolerance`, and `HAD(design="auto")` resolution to `continuous_at_zero` on this panel). T20's "Composite pretest workflow" Extensions bullet updated with a forward-pointer to T21. T22 weighted/survey HAD tutorial remains queued as a separate notebook PR.
 - **`ChaisemartinDHaultfoeuille.by_path` and `paths_of_interest` now compose with `survey_design`** for analytical Binder TSL SE and replicate-weight bootstrap variance. The `NotImplementedError` gate at `chaisemartin_dhaultfoeuille.py:1233-1239` is replaced by a per-path multiplier-bootstrap-only gate (`survey_design + n_bootstrap > 0` under by_path / paths_of_interest still raises, since the survey-aware perturbation pivot for path-restricted IFs is methodologically underived). Per-path SE routes through the existing `_survey_se_from_group_if` cell-period allocator: the per-period IF (`U_pp_l_path`) is built with non-path switcher-side contributions skipped (control contributions are unchanged, matching the joiners/leavers IF convention; preserves the row-sum identity `U_pp.sum(axis=1) == U`), cohort-recentered via `_cohort_recenter_per_period`, then expanded to observations as `psi_i = U_pp[g_i, t_i] · (w_i / W_{g_i, t_i})`. Replicate-weight designs unconditionally use the cell allocator (Class A contract from PR #323). New `_refresh_path_inference` helper post-call refreshes `safe_inference` on every populated entry across `multi_horizon_inference`, `placebo_horizon_inference`, `path_effects`, and `path_placebos` so all four surfaces use the same final `df_survey` after per-path replicate fits append `n_valid` to the shared accumulator. Path-enumeration ranking under `survey_design` remains unweighted (group-cardinality, not population-weight mass). Lonely-PSU policy stays sample-wide, not per-path. Telescope invariant: on a single-path panel, per-path SE matches the global non-by_path survey SE bit-exactly. **No R parity** — R `did_multiplegt_dyn` does not support survey weighting; this is a Python-only methodology extension. The global non-by_path TSL multiplier-bootstrap path is unaffected (anti-regression test `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSurveyDesignAnalytical::test_global_survey_plus_n_bootstrap_still_works` locks the per-path-only scope of the new gate). Cross-surface invariants regression-tested at `TestByPathSurveyDesignAnalytical` (~17 tests across gate / dispatch / analytical SE / replicate-weight SE / per-path placebos / `trends_linear` composition / unobserved-path warnings / final-df refresh regressions) and `TestByPathSurveyDesignTelescope`. See `docs/methodology/REGISTRY.md` §`ChaisemartinDHaultfoeuille` `Note (Phase 3 by_path ...)` → "Per-path survey-design SE" for the full contract.
 - **Inference-field aliases on staggered result classes** for adapter / external-consumer compatibility. Read-only `@property` aliases expose the flat `att` / `se` / `conf_int` / `p_value` / `t_stat` names (matching `DiDResults` / `TROPResults` / `SyntheticDiDResults` / `HeterogeneousAdoptionDiDResults`) on every result class that previously only carried prefixed canonical fields: `CallawaySantAnnaResults`, `StackedDiDResults`, `EfficientDiDResults`, `ChaisemartinDHaultfoeuilleResults`, `StaggeredTripleDiffResults`, `WooldridgeDiDResults`, `SunAbrahamResults`, `ImputationDiDResults`, `TwoStageDiDResults` (mapping to `overall_*`); `ContinuousDiDResults` (mapping to `overall_att_*`, ATT-side as the headline, ACRT-side accessible unchanged via `overall_acrt_*`); `MultiPeriodDiDResults` (mapping to `avg_*`). `ContinuousDiDResults` additionally exposes `overall_se` / `overall_conf_int` / `overall_p_value` / `overall_t_stat` aliases for naming consistency with the rest of the staggered family. Aliases are pure read-throughs over the canonical fields — no recomputation, no behavior change — so the `safe_inference()` joint-NaN contract (per CLAUDE.md "Inference computation") is inherited automatically (NaN canonical → NaN alias, locked at `tests/test_result_aliases.py::test_pattern_b_aliases_propagate_nan`). The native `overall_*` / `overall_att_*` / `avg_*` fields remain canonical for documentation and computation. Motivated by the `balance.interop.diff_diff.as_balance_diagnostic()` adapter (`facebookresearch/balance` PR #465) which calls `getattr(res, "se", None)` / `getattr(res, "conf_int", None)` without a fallback chain — pre-alias, every staggered result class returned `None` on those keys, silently dropping `se` and `conf_int` from the adapter's diagnostic dict. 23 alias-mechanic + balance-adapter regression tests at `tests/test_result_aliases.py`. Patch-level (additive on stable surfaces).

diff --git a/benchmarks/R/generate_dcdh_dynr_test_values.R b/benchmarks/R/generate_dcdh_dynr_test_values.R
@@ -618,6 +618,61 @@ extract_dcdh_by_path <- function(res, n_effects, n_placebos = 0) {
   list(by_path = out)
 }
 
+# Helper: extract global predict_het results. R's predict_het slot is at
+# res$results$predict_het, a data.frame with columns
+# {effect, covariate, Estimate, SE, t, LB, UB, N, pF}. Estimate is the
+# WLS coefficient on the heterogeneity covariate.
+extract_dcdh_predict_het <- function(res, n_effects) {
+  ph <- res$results$predict_het
+  horizons <- list()
+  if (is.null(ph) || nrow(ph) == 0) return(list(predict_het = horizons))
+  for (h in seq_len(min(n_effects, nrow(ph)))) {
+    horizons[[as.character(ph$effect[h])]] <- list(
+      beta = as.numeric(ph$Estimate[h]),
+      se = as.numeric(ph$SE[h]),
+      t = as.numeric(ph$t[h]),
+      ci_lo = as.numeric(ph$LB[h]),
+      ci_hi = as.numeric(ph$UB[h]),
+      n_obs = as.numeric(ph$N[h]),
+      p_value = as.numeric(ph$pF[h])
+    )
+  }
+  list(predict_het = horizons)
+}
+
+# Helper: extract per-path predict_het results. Under by_path=k +
+# predict_het, R's per-by_level dispatcher writes a predict_het table to
+# each res$by_level_i$results$predict_het. Output mirrors
+# extract_dcdh_by_path's shape with a horizons dict keyed by horizon.
+extract_dcdh_by_path_predict_het <- function(res, n_effects) {
+  by_levels <- res$by_levels
+  out <- list()
+  for (i in seq_along(by_levels)) {
+    slot <- res[[paste0("by_level_", i)]]
+    ph <- slot$results$predict_het
+    horizons <- list()
+    if (!is.null(ph) && nrow(ph) > 0) {
+      for (h in seq_len(min(n_effects, nrow(ph)))) {
+        horizons[[as.character(ph$effect[h])]] <- list(
+          beta = as.numeric(ph$Estimate[h]),
+          se = as.numeric(ph$SE[h]),
+          t = as.numeric(ph$t[h]),
+          ci_lo = as.numeric(ph$LB[h]),
+          ci_hi = as.numeric(ph$UB[h]),
+          n_obs = as.numeric(ph$N[h]),
+          p_value = as.numeric(ph$pF[h])
+        )
+      }
+    }
+    out[[i]] <- list(
+      path = by_levels[i],
+      frequency_rank = i,
+      horizons = horizons
+    )
+  }
+  list(by_path_predict_het = out)
+}
+
 # Scenario 13: mixed_single_switch + by_path=2 (basic 2-path case).
 # The mixed_single_switch DGP produces joiners (path 0,1,1,1) and
 # leavers (path 1,0,0,0) as its only two observed paths at L_max=3, so
@@ -1018,6 +1073,136 @@ cat("  Scenario 19: multi_path_reversible_by_path_non_binary\n")
   )
 }
 
+# Scenarios 20 + 21: predict_het R-parity (Wave 5 #11). Scenario 20 is
+# the global anchor (predict_het without by_path); scenario 21 is the
+# per-path version (by_path + predict_het). Both use the same DGP so the
+# Python-side parity tests can calibrate atol on the global scenario and
+# inherit the same atol for the per-path test. R's predict_het syntax
+# requires a 2-element list: list(covariate_name, horizons_vec).
+#
+# DGP shape: 90 switchers + 30 never-treated controls, 10 periods, 3
+# paths (0,1,1,1) / (0,1,0,0) / (0,1,1,0), F_g varies in {3,4,5}
+# INDEPENDENTLY of path so each path has multiple cohorts. het_x is
+# binary {0,1}, balanced across both switchers and controls. Effect =
+# 5.0 + 3.0 * het_x to produce a detectable heterogeneity signal.
+# Never-treated controls are required for R's predict_het to compute
+# horizons l>=2 under reversal paths (otherwise R returns
+# "max effects = 2" or empty cohort dummies). `dont_drop_larger_lower
+# = TRUE` matches the Python by_path requirement that
+# `drop_larger_lower=False`.
+cat("  Scenarios 20/21: multi_path_reversible_predict_het + by_path version\n")
+{
+  set.seed(120L)
+  n_switchers20 <- 90L
+  n_controls20 <- 30L
+  n_groups20 <- n_switchers20 + n_controls20
+  n_periods20 <- 10L
+  paths20 <- list(c(0L, 1L, 1L, 1L), c(0L, 1L, 0L, 0L), c(0L, 1L, 1L, 0L))
+  D20 <- matrix(0L, nrow = n_groups20, ncol = n_periods20)
+  het_x20 <- integer(n_groups20)
+  group_fe20 <- rnorm(n_groups20, 0, 2.0)
+  # Switchers (groups 1..n_switchers20)
+  for (g in seq_len(n_switchers20)) {
+    F_g_choice <- ((g - 1L) %/% 3L) %% 3L
+    F_g <- 3L + F_g_choice
+    path_idx <- ((g - 1L) %% 3L) + 1L
+    pp <- paths20[[path_idx]]
+    het_x20[g] <- if (g <= n_switchers20 %/% 2L) 1L else 0L
+    for (j in seq_along(pp)) {
+      t <- F_g - 1L + j
+      if (t >= 1L && t <= n_periods20) D20[g, t] <- pp[j]
+    }
+    if (F_g - 1L + length(pp) <= n_periods20) {
+      tail_t <- (F_g - 1L + length(pp)):n_periods20
+      D20[g, tail_t] <- pp[length(pp)]
+    }
+  }
+  # Never-treated controls (groups n_switchers20+1..n_groups20).
+  # Balanced het_x assignment so the heterogeneity covariate has
+  # variation in the control pool too.
+  for (g in (n_switchers20 + 1L):n_groups20) {
+    k <- g - n_switchers20
+    het_x20[g] <- if (k <= n_controls20 %/% 2L) 1L else 0L
+  }
+  noise20 <- matrix(rnorm(n_groups20 * n_periods20, 0, 0.5),
+                    nrow = n_groups20, ncol = n_periods20)
+  period_arr20 <- 0:(n_periods20 - 1L)
+  effect20 <- 5.0 + 3.0 * het_x20  # heterogeneity signal
+  Y20 <- matrix(group_fe20, nrow = n_groups20, ncol = n_periods20) +
+    matrix(0.5 * period_arr20, nrow = n_groups20, ncol = n_periods20, byrow = TRUE) +
+    matrix(effect20, nrow = n_groups20, ncol = n_periods20) * D20 +
+    noise20
+  d20 <- data.frame(
+    group = rep(seq_len(n_groups20) - 1L, each = n_periods20),
+    period = rep(period_arr20, n_groups20),
+    treatment = as.vector(t(D20)),
+    outcome = as.vector(t(Y20)),
+    het_x = rep(het_x20, each = n_periods20)
+  )
+
+  # Scenario 20: global predict_het anchor (no by_path).
+  # `dont_drop_larger_lower = TRUE` preserves multi-switch cohorts so the
+  # heterogeneity regression has cohort variation at every horizon.
+  # Without this, R drops off-switch paths at l>=2, leaving a single
+  # cohort and triggering the `prod_het_l_XX ~ het_x + ` empty-cohort
+  # error. dCDH `by_path` requires `drop_larger_lower=False` on the Python
+  # side anyway, so this flag is consistent with the per-path scope.
+  res20 <- did_multiplegt_dyn(
+    df = d20, outcome = "outcome", group = "group", time = "period",
+    treatment = "treatment", effects = 3,
+    dont_drop_larger_lower = TRUE,
+    predict_het = list("het_x", c(1, 2, 3)),
+    ci_level = 95, graph_off = TRUE
+  )
+  scenarios$multi_path_reversible_predict_het <- list(
+    data = list(
+      group = as.numeric(d20$group),
+      period = as.numeric(d20$period),
+      treatment = as.numeric(d20$treatment),
+      outcome = as.numeric(d20$outcome),
+      het_x = as.numeric(d20$het_x)
+    ),
+    params = list(pattern = "multi_path_reversible_predict_het",
+                  n_switchers = n_switchers20, n_controls = n_controls20,
+                  n_groups = n_groups20, n_periods = n_periods20,
+                  seed = 120L, effects = 3,
+                  predict_het_var = "het_x",
+                  predict_het_horizons = c(1, 2, 3),
+                  ci_level = 95,
+                  dont_drop_larger_lower = TRUE),
+    results = extract_dcdh_predict_het(res20, n_effects = 3)
+  )
+
+  # Scenario 21: by_path + predict_het (per-path version on same DGP).
+  # `dont_drop_larger_lower = TRUE` matches scenario 20 + Python
+  # by_path's `drop_larger_lower=False` requirement.
+  res21 <- did_multiplegt_dyn(
+    df = d20, outcome = "outcome", group = "group", time = "period",
+    treatment = "treatment", effects = 3, by_path = 3,
+    dont_drop_larger_lower = TRUE,
+    predict_het = list("het_x", c(1, 2, 3)),
+    ci_level = 95, graph_off = TRUE
+  )
+  scenarios$multi_path_reversible_by_path_predict_het <- list(
+    data = list(
+      group = as.numeric(d20$group),
+      period = as.numeric(d20$period),
+      treatment = as.numeric(d20$treatment),
+      outcome = as.numeric(d20$outcome),
+      het_x = as.numeric(d20$het_x)
+    ),
+    params = list(pattern = "multi_path_reversible_by_path_predict_het",
+                  n_switchers = n_switchers20, n_controls = n_controls20,
+                  n_groups = n_groups20, n_periods = n_periods20,
+                  seed = 120L, effects = 3, by_path = 3,
+                  predict_het_var = "het_x",
+                  predict_het_horizons = c(1, 2, 3),
+                  ci_level = 95,
+                  dont_drop_larger_lower = TRUE),
+    results = extract_dcdh_by_path_predict_het(res21, n_effects = 3)
+  )
+}
+
 # ---------------------------------------------------------------------------
 # Write output
 # ---------------------------------------------------------------------------