Skip to content

ci: harden wheels-bot Reviewer A against placeholder submissions#2561

Merged
bpamiri merged 2 commits into
developfrom
peter/bot-reviewer-a-guards
May 11, 2026
Merged

ci: harden wheels-bot Reviewer A against placeholder submissions#2561
bpamiri merged 2 commits into
developfrom
peter/bot-reviewer-a-guards

Conversation

@bpamiri
Copy link
Copy Markdown
Collaborator

@bpamiri bpamiri commented May 11, 2026

Summary

Three changes harden the wheels-bot Reviewer A pipeline against the placeholder-submission failure mode observed on #2546:

  • Post-submission guard (bot-review-a.yml): new "Validate Reviewer A output" step runs after the Claude action (with if: always()). Scans all active wheels-bot reviews on the head SHA and auto-dismisses any whose body is shorter than 200 characters or missing the canonical wheels-bot:review-a marker. Posts a one-line comment summarizing dismissals. This is the load-bearing fix — prompt instructions alone cannot prevent the model from probing gh pr review with a placeholder.
  • Marker gate trailing-colon fix (bot-review-a.yml:96): the initial review marker is <!-- wheels-bot:review-a:<pr>:<sha> --> with no suffix, but the idempotency marker-pattern ended with :, making the skip check a no-op. Response markers use wheels-bot:review-a-response: (different prefix) so removing the trailing colon doesn't false-match them.
  • Prompt tightening (review-pr.md, respond-to-critique.md): explicit instruction to issue exactly one gh pr review per session and never probe the command with a placeholder body. Belt-and-suspenders on top of the guard.

Related Issue

Closes #2558

Type of Change

  • Bug fix

Feature Completeness Checklist

  • Tests — Bot-infrastructure change; not exercised by bash tools/test-local.sh. Manual verification plan below.
  • Framework Docs — N/A (bot infra)
  • AI Reference Docs — N/A (bot infra)
  • CLAUDE.md — Not changed; the "Wheels Bot" section's high-level description still applies. The new guard step is referenced in the inline comment in bot-review-a.yml and in the prompt files.
  • CHANGELOG.md — Not added; this is bot infra, not a user-facing framework change.
  • Test runner passes — Same reason as above; no framework code touched.

Test Plan

The guard cannot be exercised pre-merge because Reviewer A only runs on PRs that match its if: conditions. Verification once merged:

  • Trigger a normal review pass (open a real PR). Confirm the guard step appears in the workflow run with ::notice::No active wheels-bot reviews on <sha> to validate OR no dismissals.
  • Confirm the marker-pattern gate now matches: open a PR, force a second synchronize push without changing the SHA (via git commit --amend --no-edit && git push --force-with-lease on the same head), confirm Reviewer A's second invocation skips with idempotency marker already present.
  • Negative test (controlled): on a throwaway test PR, manually submit a wheels-bot review with body test body via gh pr review impersonating the bot (or trigger Reviewer A with a deliberate placeholder injection if convenient), confirm the next workflow run dismisses it and posts the guard comment.

Reviewer notes

  • The guard runs against the wheels-bot identity using the same App token that submitted the reviews — dismissal requires pull_requests: write, which the App already has (it submits reviews via the same token).
  • Dismissed reviews remain visible in the GitHub UI marked "dismissed" with the explanatory message; verdicts no longer count toward branch protection. This is the right semantic — we don't want placeholder approvals to satisfy a "1 approving review" requirement.
  • Reviewer B's lookup of A's latest review (in review-the-review.md) is not updated in this PR to skip dismissed reviews. Worth a follow-up — when a real placeholder happens, B still sees the dismissed review as A's "latest" and may critique it. Listed as a checkbox in issue wheels-bot: Reviewer A submitting placeholder content instead of analysis #2558.
  • Adding bot infrastructure to the auto-downgrade list in propose-fix.md (so future bot-infra issues are held for human review instead of attempting TDD on workflow YAML) is the orthogonal follow-up from wheels-bot: Reviewer A submitting placeholder content instead of analysis #2558 — not in this PR.

Closes #2558. PR #2546's Reviewer A submitted a review
with body "test body" (state=APPROVED, no marker) before issuing the
substantive review on the same SHA 38 seconds later. Root-cause hypothesis:
Sonnet probed `gh pr review` with a placeholder before the real call.
Three changes harden the pipeline:

- bot-review-a.yml: new "Validate Reviewer A output" step. Scans all
  active wheels-bot reviews on the head SHA after the Claude action
  exits and auto-dismisses any whose body is under 200 chars or missing
  the canonical `wheels-bot:review-a` marker. Runs on `always()` so it
  fires even when the Claude step failed mid-session. This is the
  load-bearing fix — prompt instructions cannot prevent the failure mode
  on their own.

- bot-review-a.yml: remove trailing colon from the idempotency
  marker-pattern. The initial review marker is
  `<!-- wheels-bot:review-a:<pr>:<sha> -->` with no suffix; the trailing
  `:` made the skip check a no-op. Response markers use the distinct
  `wheels-bot:review-a-response:` prefix and won't false-match.

- review-pr.md and respond-to-critique.md: explicit instruction to issue
  exactly one `gh pr review` per session and never probe the command with
  a placeholder body. Belt-and-suspenders on top of the guard.
@github-actions github-actions Bot added the docs label May 11, 2026
Copy link
Copy Markdown
Contributor

@wheels-bot wheels-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wheels Bot — Reviewer A

TL;DR: This PR fixes a genuine idempotency no-op bug in bot-review-a.yml (trailing colon in marker-pattern), adds a post-submission guard that auto-dismisses bogus placeholder reviews, and tightens the prompt instructions in both review-pr.md and respond-to-critique.md. The changes are coherent, the shell logic is sound, and the YAML formatting is correct. Verdict: comment — two minor observations worth the author's attention, but neither is a blocker.


Correctness

Trailing colon fix is correct and load-bearing.
The wheels-bot-skip-check action uses grep -Eq "$MARKER_PATTERN" against the haystack of existing review bodies. The actual marker that Reviewer A embeds is <!-- wheels-bot:review-a:<pr>:<sha> --> — no trailing colon. The old pattern wheels-bot:review-a:2561:abc123: would never match it, so the idempotency gate was silently a no-op on every PR since the feature shipped. The fix removes the colon and is correct.

Guard logic is correct, but the OR condition dismisses marker-free-but-long reviews.
.github/workflows/bot-review-a.yml lines 152–157:

if [[ "$body_len" -lt 200 ]] || ! grep -q 'wheels-bot:review-a' <<<"$body"; then

A substantive (>200 char) review that forgot the marker — e.g. if the model produced a real review but omitted the closing HTML comment — would be dismissed here. That is intentional (without a marker the idempotency mechanism stays broken), but it means the guard is load-bearing for prompt compliance. The PR description acknowledges this correctly. No action required; just flag it in a follow-up comment so future maintainers understand the logic.

grep -q 'wheels-bot:review-a' correctly passes response reviews.
The response marker format (from respond-to-critique.md step 4) is <!-- wheels-bot:review-a-response:<pr>:<sha>:<N> -->, which contains wheels-bot:review-a as a substring. Response reviews would not be dismissed. ✓

Guard comment has no idempotency marker check before posting.
bot-review-a.yml line 168 onwards: the guard posts a ## Wheels Bot — Reviewer A guard comment when dismissed > 0, but does not first check whether a wheels-bot:review-a-guard:${PR_NUMBER}:${HEAD_SHA} comment already exists. The practical risk is low — once reviews are dismissed they are filtered by select(.state != "DISMISSED") so the dismissed counter would be 0 on a re-run — but a manual workflow re-trigger combined with new bogus reviews appearing on the same SHA could produce duplicate guard comments. Not a blocker; flagging for awareness.


Tests

No automated tests cover bot infrastructure, which the PR correctly acknowledges. The manual verification plan is adequate for this type of change. The framework test suite is unaffected.


Docs

Checklist is filled honestly. The "Wheels Bot" section of CLAUDE.md describes the guard conceptually correctly post-merge (the per-step detail lives in the workflow comment, which is the right place). No CHANGELOG entry needed for bot infra. ✓


Commits

Single commit ci: harden wheels-bot Reviewer A against placeholder submissions — type ci, no scope, subject ≤ 100 chars, not ALL-CAPS. Matches commitlint.config.js allowlist. ✓


Security

  • PR_NUMBER is validated as numeric in the Resolve PR info step before reaching the guard script. No injection surface.
  • HEAD_SHA is fetched from GitHub's own API (headRefOid), not from user-controlled input.
  • jq -r outputs are used in grep herstrings and arithmetic, not in eval or unquoted shell expansions. ✓

Two follow-ups from PR #2561 review:

- Document the dismissal criteria's intent inline. The OR condition
  (`body_len < 200 || missing marker`) deliberately dismisses long
  reviews that lack the marker, because marker-less reviews break the
  downstream idempotency gate. Code comment explains this so future
  maintainers don't relax it.

- Idempotency-check the guard comment before posting. A manual workflow
  re-trigger combined with new bogus reviews on the same SHA could
  otherwise produce duplicate guard comments. The check greps PR
  comments for the existing
  `wheels-bot:review-a-guard:<pr>:<sha>` marker and skips re-posting
  when found.
@bpamiri
Copy link
Copy Markdown
Collaborator Author

bpamiri commented May 11, 2026

Addressed both non-blocking observations from Reviewer A in 00e2a8c:

  • Obs 2 (marker-free-but-long reviews get dismissed) — added an inline code comment in the YAML explaining that this is deliberate: a marker-less review breaks the downstream idempotency gate, so the guard treats prompt compliance as load-bearing rather than advisory. No behavioural change.
  • Obs 3 (duplicate guard comments on manual re-trigger) — added an idempotency check before posting. Greps existing PR comments for the wheels-bot:review-a-guard:<pr>:<sha> marker; skips re-posting when found.

Obs 1 (trailing-colon fix) was an affirmation, no action needed.

Copy link
Copy Markdown
Contributor

@wheels-bot wheels-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wheels Bot — Reviewer A

TL;DR: This follow-up commit addresses both findings from the previous review (SHA e38aebcae): it adds inline documentation explaining why the OR dismissal condition is intentional, and it adds an idempotency check so the guard comment is never posted twice for the same PR + SHA. Both fixes are correct. Verdict: approve.


Correctness

Dismissal-criteria comment is accurate and load-bearing.
Lines 156-162 of bot-review-a.yml now explain clearly that a long body without the marker still gets dismissed because a marker-less review breaks the downstream idempotency gate. The comment correctly captures the original reasoning and will prevent a future maintainer from relaxing the OR to AND. ✓

Guard-comment idempotency check is correct.
Lines 183-196 now compute guard_marker="wheels-bot:review-a-guard:${PR_NUMBER}:${HEAD_SHA}", fetch all PR issue comments via gh api ... --paginate, and filter with jq -r --arg m "$guard_marker" '[.[] | select(.body | contains($m))] | length'. If existing == "0", the comment is posted; otherwise it is skipped with a ::notice:: log line. The jq filter is correct — contains() on strings does substring matching, and length on the resulting array is an integer that compares correctly against "0" in [[ ... ]]. ✓

No regression to the original dismissal logic.
The dismissed counter and while IFS= read -r row loop are unchanged. The new idempotency block is gated on dismissed -gt 0, so the gh api comment-list call is only made when it is actually needed. ✓


Commits

Single commit ci: address Reviewer A feedback on guard logic — type ci, no scope, subject 100 chars, not ALL-CAPS. Matches commitlint.config.js allowlist. ✓


Security

guard_marker contains only alphanumeric characters and colons — no injection surface in the jq --arg call or the --body string. ✓

@wheels-bot
Copy link
Copy Markdown
Contributor

wheels-bot Bot commented May 11, 2026

Wheels Bot — Reviewer B (round 1)

A's review is accurate and well-supported. The two fixes A verified — the dismissal-criteria comment (why OR, not AND) and the guard-comment idempotency check — are both present and correct in the actual code. A's jq analysis is right: contains() is substring matching, length on the filtered array emits an integer, and [[ "$existing" == "0" ]] is a valid string comparison against jq's output. No sycophancy, no false positives, no missed issues. Verdict (approve) is consistent with the findings.

Sycophancy

None detected. A backed each claim with specific line references and code logic.

False positives

None detected. Verified the trailing-colon fix at line 100, the dismissal-criteria comment block, and the idempotency check logic against the current file. All accurate.

Missed issues

None detected. The prompt-file changes in .claude/commands/ are low-risk documentation tweaks; not mentioning them is fine. The known gap about Reviewer B not skipping dismissed reviews is called out in the PR description and is an orthogonal follow-up — A correctly left it for issue #2558.

Verdict alignment

A's approve is appropriate: both actionable findings from the prior review are addressed correctly and there are no new issues introduced.

Convergence

Aligned. A's analysis is accurate and the verdict is correct. The PR is review-clean at this SHA.

@bpamiri bpamiri merged commit 8a6e8a9 into develop May 11, 2026
6 checks passed
@bpamiri bpamiri deleted the peter/bot-reviewer-a-guards branch May 11, 2026 05:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

wheels-bot: Reviewer A submitting placeholder content instead of analysis

1 participant