ci: retry blog visual baseline auto-push with fetch+rebase loop#2560
Conversation
The "Commit pre-refreshed blog baseline" step in web-deploy.yml's visual-regression job pushed to develop with a single `git push origin HEAD:develop` — no retry. When develop moved during the ~3 min job runtime (e.g., another PR merging right behind a blog content push), the push was rejected with `! [rejected] HEAD -> develop (fetch first)` and the auto-refreshed baseline was silently dropped. Effect: every subsequent non-blog PR tripped visual-regression on a stale baseline until someone manually refreshed. Concrete trigger this week: PRs #2549 + #2551 merging back-to-back left blog.png predating both "Background Jobs Without Redis" and "Multi-Tenancy Built In" posts, blocking PR #2559 with 38,581-pixel diff. Fix: wrap the push in a 5-attempt fetch+rebase+push loop. Rebase conflict path exits clean (would only happen if a parallel run already refreshed the baseline, making ours redundant). Push-rejected path retries with a short randomized backoff. See run 25647803649 for the failure mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Wheels Bot — Reviewer A
TL;DR: This PR correctly identifies and fixes a real race condition in the blog baseline auto-push: a single git push with no retry was silently dropped whenever develop moved during the ~3-minute job, leaving blog.png stale and tripping unrelated PRs. The fetch+rebase loop is a clean, idiomatic fix. Two minor nits below; neither is a correctness issue.
Correctness
The loop logic is sound across all cases:
- Push rejected (develop advanced):
git fetch + git rebase + retrycorrectly replays the single baseline commit on top of the newdevelopHEAD. Subsequent rebase attempts chain correctly — after rebasing onto D1, the next iteration rebases onto D2 (if develop moved again), because the merge base between the already-rebased commit and the new HEAD is always the previousorigin/develop. - Rebase conflict (parallel job already refreshed
blog.png):git rebase --abort+exit 0is the right call — the baseline is already fresh ondevelop. - 5 failures:
exit 1surfaces the failure visibly, and the next blog-content push retriggers the job. - PR event (not push):
steps.scopeonly runs onpush, soblog_content_changedis empty on PRs and the step is gated out entirely. No regression risk.
Minor nit 1 — unnecessary sleep on final attempt.
.github/workflows/web-deploy.yml, loop body (the sleep line is inside the for body, unconditional):
for attempt in 1 2 3 4 5; do
...
echo "Push rejected on attempt $attempt; will fetch + rebase + retry."
sleep $((RANDOM % 5 + 2)) # <-- runs even when attempt == 5
done
echo "Failed to push baseline refresh after 5 attempts — ..."
exit 1On the fifth rejected push, the script sleeps 2–6 seconds before printing the failure message and exiting. This is harmless in practice, but the sleep serves no purpose when there is no retry following it. Consider:
if [ "$attempt" -lt 5 ]; then
sleep $((RANDOM % 5 + 2))
fiMinor nit 2 — git rebase --abort is subject to set -e.
Inside the if ! git rebase origin/develop; then block, git rebase --abort runs under set -euo pipefail. If --abort itself were to fail (e.g., nothing to abort because the rebase state was already cleaned up), the step would exit 1 rather than the intended exit 0. This is a vanishingly unlikely edge case, but:
if ! git rebase origin/develop; then
git rebase --abort || true # defensive: --abort failing is still "done"
echo "..."
exit 0
fiCommits
ci: retry blog visual baseline auto-push with fetch+rebase loop — valid type (ci), no scope (correct — CI workflow, not a framework layer), subject under 100 chars, not ALL-CAPS. Passes commitlint.
Tests / Docs
No CFML code changed; no test spec or doc update required.
Wheels Bot -- Reviewer B (round 1)A's review is accurate and well-reasoned. Both nits are verified against the actual diff; the correctness analysis is sound; no sycophancy. I'm aligned with A's assessment -- the fix is correct and merge-ready with minor cosmetic improvements available. SycophancyNone detected. A used COMMENTED rather than APPROVED despite finding the fix correct, which is the right level of restraint for a CI push-path change. The positive framing is earned. False positivesNone detected.
Missed issuesOne minor sub-point A didn't raise alongside Nit 1: the No other missed issues. This is a pure CI loop fix touching one workflow file and one PNG; no CFML code, no cross-engine concerns, no security surface changes. Verdict alignmentA's COMMENTED verdict is consistent with the findings: two minor nits, no correctness issues, no blocking concerns. Appropriate. ConvergenceAligned. The fix is correct across all three cases A analyzed (push rejected, rebase conflict, 5 failures), and both nits are minor non-blocking cosmetics. Joint recommendation: this PR is merge-ready as-is; addressing Nit 1 (sleep guard + misleading echo on final attempt) and Nit 2 ( |
Two pieces, both fallout from the race-condition bug this PR fixes: 1. Refresh `web/tests/visual-baselines/blog.png`. The May 9 → present auto-refresh pushes were dropped by the bug, so develop's baseline predates `Background Jobs Without Redis` and `Multi-Tenancy Built In` posts. Without this commit, PR #2560's own visual-regression check stays red and every non-blog PR keeps tripping. Regenerated locally via `pnpm --filter @wheels-dev/site-blog build` + `node scripts/visual-regression.mjs --update --site blog`; verified the freshly-written baseline passes a subsequent check run with 0-pixel diff. 2. Two minor nits from Reviewer A on the retry loop: - Guard the final-iteration `sleep` behind `[ "$attempt" -lt 5 ]` so the 5th rejected push doesn't sleep 2-6s before exiting. - Add `|| true` to `git rebase --abort` so a failed abort (e.g., already-clean rebase state) doesn't flip the intended exit-0 conflict path to exit-1 under `set -euo pipefail`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer B noted that the "Push rejected on attempt $attempt; will fetch + rebase + retry." echo runs unconditionally inside the for-loop body, so on attempt 5 it lies — there's no retry coming, the loop falls through to exit 1. Move the echo inside the same `[ "$attempt" -lt 5 ]` guard that already skips the backoff sleep, and emit a distinct "final attempt, no retry" message in the else branch. Pure cosmetic; no control-flow change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Wheels Bot — Reviewer A
TL;DR: This follow-on commit addresses both nits from the previous review exactly as suggested — no new issues introduced. The blog.png refresh is the right accompanying change: it unblocks the PR's own visual-regression check and removes the downstream block on #2559. Approving.
Correctness
Both nits have been resolved precisely:
Nit 1 — sleep guard: The if [ "$attempt" -lt 5 ] check now correctly skips the 2–6s backoff on the final iteration, where no retry follows.
if [ "$attempt" -lt 5 ]; then
sleep $((RANDOM % 5 + 2))
fiNit 2 — git rebase --abort under set -e: The || true guard prevents a spurious exit-1 if --abort itself fails (e.g., already-clean rebase state).
git rebase --abort || trueNo new code paths were added; the retry loop logic reviewed in round 1 is unchanged.
Baseline image: The commit message describes regenerating blog.png locally via pnpm --filter @wheels-dev/site-blog build + node scripts/visual-regression.mjs --update --site blog and verifying a 0-pixel-diff pass afterward. This is the correct process and the right bundling point — the stale baseline is a direct artifact of the race-condition bug this PR fixes, and without the refresh the PR's own check stays red.
Commits
Both commits conform to commitlint:
| Commit | Type | Scope | Subject |
|---|---|---|---|
4ff53ab |
ci |
none | retry blog visual baseline auto-push with fetch+rebase loop |
1da1ab8 |
ci |
none | refresh stale blog baseline and address review nits |
Types are in the allowlist; no scope is the correct call for CI workflow changes; subjects are under 100 chars and not ALL-CAPS.
Tests / Docs
No CFML code changed; no spec or .ai/wheels/ update required.
There was a problem hiding this comment.
Wheels Bot review for acc944f
The third and final commit (acc944f) is a pure cosmetic tightening: the misleading "will retry" echo on the final loop iteration has been moved inside the same guard as the backoff sleep, with an accurate "final attempt, no retry" message in the else branch. No control-flow change. The substantive fix was already approved at 1da1ab8. Approving the updated head.
Correctness: the change does exactly what the commit message claims. Before, attempt 5 printed "will fetch + rebase + retry" then fell through to exit 1 with no retry -- factually wrong. After, the echo is gated on attempt -lt 5, and the else branch correctly says "final attempt, no retry". The retry loop, rebase conflict handling (git rebase --abort || true), and the sleep guard from the previous round are all unchanged.
Commits: all three conform to commitlint. Types (ci) from allowlist; no scope is correct for CI workflow changes; subjects under 100 chars, none ALL-CAPS.
Tests / Docs: no CFML code changed; no spec or .ai/wheels/ update required.
wheels-bot-review-a-2560-acc944f771472de4d3959a83250dd29da14ff5d6
|
No description provided. |
Wheels Bot — Reviewer B (round 1)A's review is accurate. The two nits from round 1 are verified as fixed in the diff, and A's APPROVED verdict is earned. One small documentation gap: A's commit table lists two commits but the PR has three. The third ( SycophancyNone detected. A approved a CI push-path change with concrete line-level verification — not a bare LGTM. False positivesNone detected.
Missed issuesOne documentation gap: A's commit table lists two commits ( No other missed issues. This is a CI-only change (one workflow file, one binary PNG) with no CFML code, no cross-engine concerns, and no security surface. Verdict alignmentA's APPROVED verdict is consistent with the findings: both nits resolved, no new code paths, no blocking concerns. ConvergenceAligned. The fix is correct, both nits are addressed, and the third commit is valid. Joint recommendation: merge-ready as-is. |
Wheels Bot -- Reviewer B (round 1)A's review is accurate. The commit is exactly the cosmetic fix described, both nits from round 1 are resolved, and the APPROVED verdict is appropriate for a purely cosmetic change on top of an already-approved substantive fix. SycophancyNone detected. A approved with concrete reasoning -- citing the specific change (echo moved inside guard, accurate else-branch message) and explicitly confirming no control-flow change. The positive framing is earned. False positivesNone detected.
Missed issuesNone detected. This diff touches one workflow file (shell loop cosmetics) and one PNG. No CFML code, no cross-engine concerns, no security surface. B's minor sub-point from round 1 (misleading echo on final attempt) is exactly what acc944f addressed. Verdict alignmentA's APPROVED verdict is consistent with the findings: this is a cosmetic-only commit fixing the last nit B raised in round 1, with no blocking concerns. ConvergenceAligned. Both nits from round 1 (sleep-on-final-attempt and misleading echo) are resolved across 1da1ab8 and acc944f. A's APPROVED verdict is correct. Joint recommendation: merge-ready. |
Summary
web-deploy.yml's "Commit pre-refreshed blog baseline" step pushed todevelopwith a singlegit push— no retry. Whendevelopmoved during the ~3 min job (e.g., another PR merging right behind a blog content push), the push was rejected with! [rejected] HEAD -> develop (fetch first)and the auto-refreshed baseline was silently dropped.Concrete failure mode this week:
Background Jobs Without Redis,Multi-Tenancy Built In) lost the push race.web/tests/visual-baselines/blog.pngondevelopis now ~2 posts stale.[blog] FAIL — 38,581 pixels differ (threshold: 300).Reference run: https://github.com/wheels-dev/wheels/actions/runs/25647803649
Fix
Wrap the push in a 5-attempt
git fetch + rebase + pushloop:What this PR does NOT fix
The currently-stale
blog.pngondevelop. That's a downstream artifact of the bug and needs a separatechore(docs):PR (or a successful blog content push under the fixed workflow). PR #2559 will stay blocked on its[blog] FAILuntil that lands.Test plan
visual-regressionjob passes (no blog content change → loop never runs, same as before).developshows the loop in action; if the push succeeds first try, log line readsPushed baseline refresh on attempt 1.🤖 Generated with Claude Code