ci: harden wheels-bot Reviewer A against placeholder submissions#2561
Conversation
Closes #2558. PR #2546's Reviewer A submitted a review with body "test body" (state=APPROVED, no marker) before issuing the substantive review on the same SHA 38 seconds later. Root-cause hypothesis: Sonnet probed `gh pr review` with a placeholder before the real call. Three changes harden the pipeline: - bot-review-a.yml: new "Validate Reviewer A output" step. Scans all active wheels-bot reviews on the head SHA after the Claude action exits and auto-dismisses any whose body is under 200 chars or missing the canonical `wheels-bot:review-a` marker. Runs on `always()` so it fires even when the Claude step failed mid-session. This is the load-bearing fix — prompt instructions cannot prevent the failure mode on their own. - bot-review-a.yml: remove trailing colon from the idempotency marker-pattern. The initial review marker is `<!-- wheels-bot:review-a:<pr>:<sha> -->` with no suffix; the trailing `:` made the skip check a no-op. Response markers use the distinct `wheels-bot:review-a-response:` prefix and won't false-match. - review-pr.md and respond-to-critique.md: explicit instruction to issue exactly one `gh pr review` per session and never probe the command with a placeholder body. Belt-and-suspenders on top of the guard.
There was a problem hiding this comment.
Wheels Bot — Reviewer A
TL;DR: This PR fixes a genuine idempotency no-op bug in bot-review-a.yml (trailing colon in marker-pattern), adds a post-submission guard that auto-dismisses bogus placeholder reviews, and tightens the prompt instructions in both review-pr.md and respond-to-critique.md. The changes are coherent, the shell logic is sound, and the YAML formatting is correct. Verdict: comment — two minor observations worth the author's attention, but neither is a blocker.
Correctness
Trailing colon fix is correct and load-bearing.
The wheels-bot-skip-check action uses grep -Eq "$MARKER_PATTERN" against the haystack of existing review bodies. The actual marker that Reviewer A embeds is <!-- wheels-bot:review-a:<pr>:<sha> --> — no trailing colon. The old pattern wheels-bot:review-a:2561:abc123: would never match it, so the idempotency gate was silently a no-op on every PR since the feature shipped. The fix removes the colon and is correct.
Guard logic is correct, but the OR condition dismisses marker-free-but-long reviews.
.github/workflows/bot-review-a.yml lines 152–157:
if [[ "$body_len" -lt 200 ]] || ! grep -q 'wheels-bot:review-a' <<<"$body"; thenA substantive (>200 char) review that forgot the marker — e.g. if the model produced a real review but omitted the closing HTML comment — would be dismissed here. That is intentional (without a marker the idempotency mechanism stays broken), but it means the guard is load-bearing for prompt compliance. The PR description acknowledges this correctly. No action required; just flag it in a follow-up comment so future maintainers understand the logic.
grep -q 'wheels-bot:review-a' correctly passes response reviews.
The response marker format (from respond-to-critique.md step 4) is <!-- wheels-bot:review-a-response:<pr>:<sha>:<N> -->, which contains wheels-bot:review-a as a substring. Response reviews would not be dismissed. ✓
Guard comment has no idempotency marker check before posting.
bot-review-a.yml line 168 onwards: the guard posts a ## Wheels Bot — Reviewer A guard comment when dismissed > 0, but does not first check whether a wheels-bot:review-a-guard:${PR_NUMBER}:${HEAD_SHA} comment already exists. The practical risk is low — once reviews are dismissed they are filtered by select(.state != "DISMISSED") so the dismissed counter would be 0 on a re-run — but a manual workflow re-trigger combined with new bogus reviews appearing on the same SHA could produce duplicate guard comments. Not a blocker; flagging for awareness.
Tests
No automated tests cover bot infrastructure, which the PR correctly acknowledges. The manual verification plan is adequate for this type of change. The framework test suite is unaffected.
Docs
Checklist is filled honestly. The "Wheels Bot" section of CLAUDE.md describes the guard conceptually correctly post-merge (the per-step detail lives in the workflow comment, which is the right place). No CHANGELOG entry needed for bot infra. ✓
Commits
Single commit ci: harden wheels-bot Reviewer A against placeholder submissions — type ci, no scope, subject ≤ 100 chars, not ALL-CAPS. Matches commitlint.config.js allowlist. ✓
Security
PR_NUMBERis validated as numeric in theResolve PR infostep before reaching the guard script. No injection surface.HEAD_SHAis fetched from GitHub's own API (headRefOid), not from user-controlled input.jq -routputs are used ingrepherstrings and arithmetic, not inevalor unquoted shell expansions. ✓
Two follow-ups from PR #2561 review: - Document the dismissal criteria's intent inline. The OR condition (`body_len < 200 || missing marker`) deliberately dismisses long reviews that lack the marker, because marker-less reviews break the downstream idempotency gate. Code comment explains this so future maintainers don't relax it. - Idempotency-check the guard comment before posting. A manual workflow re-trigger combined with new bogus reviews on the same SHA could otherwise produce duplicate guard comments. The check greps PR comments for the existing `wheels-bot:review-a-guard:<pr>:<sha>` marker and skips re-posting when found.
|
Addressed both non-blocking observations from Reviewer A in 00e2a8c:
Obs 1 (trailing-colon fix) was an affirmation, no action needed. |
There was a problem hiding this comment.
Wheels Bot — Reviewer A
TL;DR: This follow-up commit addresses both findings from the previous review (SHA e38aebcae): it adds inline documentation explaining why the OR dismissal condition is intentional, and it adds an idempotency check so the guard comment is never posted twice for the same PR + SHA. Both fixes are correct. Verdict: approve.
Correctness
Dismissal-criteria comment is accurate and load-bearing.
Lines 156-162 of bot-review-a.yml now explain clearly that a long body without the marker still gets dismissed because a marker-less review breaks the downstream idempotency gate. The comment correctly captures the original reasoning and will prevent a future maintainer from relaxing the OR to AND. ✓
Guard-comment idempotency check is correct.
Lines 183-196 now compute guard_marker="wheels-bot:review-a-guard:${PR_NUMBER}:${HEAD_SHA}", fetch all PR issue comments via gh api ... --paginate, and filter with jq -r --arg m "$guard_marker" '[.[] | select(.body | contains($m))] | length'. If existing == "0", the comment is posted; otherwise it is skipped with a ::notice:: log line. The jq filter is correct — contains() on strings does substring matching, and length on the resulting array is an integer that compares correctly against "0" in [[ ... ]]. ✓
No regression to the original dismissal logic.
The dismissed counter and while IFS= read -r row loop are unchanged. The new idempotency block is gated on dismissed -gt 0, so the gh api comment-list call is only made when it is actually needed. ✓
Commits
Single commit ci: address Reviewer A feedback on guard logic — type ci, no scope, subject 100 chars, not ALL-CAPS. Matches commitlint.config.js allowlist. ✓
Security
guard_marker contains only alphanumeric characters and colons — no injection surface in the jq --arg call or the --body string. ✓
Wheels Bot — Reviewer B (round 1)A's review is accurate and well-supported. The two fixes A verified — the dismissal-criteria comment (why OR, not AND) and the guard-comment idempotency check — are both present and correct in the actual code. A's jq analysis is right: SycophancyNone detected. A backed each claim with specific line references and code logic. False positivesNone detected. Verified the trailing-colon fix at line 100, the dismissal-criteria comment block, and the idempotency check logic against the current file. All accurate. Missed issuesNone detected. The prompt-file changes in Verdict alignmentA's approve is appropriate: both actionable findings from the prior review are addressed correctly and there are no new issues introduced. ConvergenceAligned. A's analysis is accurate and the verdict is correct. The PR is review-clean at this SHA. |
Summary
Three changes harden the wheels-bot Reviewer A pipeline against the placeholder-submission failure mode observed on #2546:
bot-review-a.yml): new "Validate Reviewer A output" step runs after the Claude action (withif: always()). Scans all active wheels-bot reviews on the head SHA and auto-dismisses any whose body is shorter than 200 characters or missing the canonicalwheels-bot:review-amarker. Posts a one-line comment summarizing dismissals. This is the load-bearing fix — prompt instructions alone cannot prevent the model from probinggh pr reviewwith a placeholder.bot-review-a.yml:96): the initial review marker is<!-- wheels-bot:review-a:<pr>:<sha> -->with no suffix, but the idempotencymarker-patternended with:, making the skip check a no-op. Response markers usewheels-bot:review-a-response:(different prefix) so removing the trailing colon doesn't false-match them.review-pr.md,respond-to-critique.md): explicit instruction to issue exactly onegh pr reviewper session and never probe the command with a placeholder body. Belt-and-suspenders on top of the guard.Related Issue
Closes #2558
Type of Change
Feature Completeness Checklist
bash tools/test-local.sh. Manual verification plan below.bot-review-a.ymland in the prompt files.Test Plan
The guard cannot be exercised pre-merge because Reviewer A only runs on PRs that match its
if:conditions. Verification once merged:::notice::No active wheels-bot reviews on <sha> to validateOR no dismissals.synchronizepush without changing the SHA (viagit commit --amend --no-edit && git push --force-with-leaseon the same head), confirm Reviewer A's second invocation skips withidempotency marker already present.test bodyviagh pr reviewimpersonating the bot (or trigger Reviewer A with a deliberate placeholder injection if convenient), confirm the next workflow run dismisses it and posts the guard comment.Reviewer notes
pull_requests: write, which the App already has (it submits reviews via the same token).review-the-review.md) is not updated in this PR to skip dismissed reviews. Worth a follow-up — when a real placeholder happens, B still sees the dismissed review as A's "latest" and may critique it. Listed as a checkbox in issue wheels-bot: Reviewer A submitting placeholder content instead of analysis #2558.propose-fix.md(so future bot-infra issues are held for human review instead of attempting TDD on workflow YAML) is the orthogonal follow-up from wheels-bot: Reviewer A submitting placeholder content instead of analysis #2558 — not in this PR.