Skip to content

ci: retry blog visual baseline auto-push with fetch+rebase loop#2560

Merged
bpamiri merged 3 commits into
developfrom
peter/fix-blog-baseline-race
May 11, 2026
Merged

ci: retry blog visual baseline auto-push with fetch+rebase loop#2560
bpamiri merged 3 commits into
developfrom
peter/fix-blog-baseline-race

Conversation

@bpamiri
Copy link
Copy Markdown
Collaborator

@bpamiri bpamiri commented May 11, 2026

Summary

web-deploy.yml's "Commit pre-refreshed blog baseline" step pushed to develop with a single git push — no retry. When develop moved during the ~3 min job (e.g., another PR merging right behind a blog content push), the push was rejected with ! [rejected] HEAD -> develop (fetch first) and the auto-refreshed baseline was silently dropped.

Concrete failure mode this week:

Reference run: https://github.com/wheels-dev/wheels/actions/runs/25647803649

Fix

Wrap the push in a 5-attempt git fetch + rebase + push loop:

  • Push rejected → fetch + rebase + retry with short randomized backoff.
  • Rebase conflict → exit clean (would only happen if a parallel run already refreshed the baseline, making ours redundant).
  • 5 failures → fail the job so the failure is visible; next blog content push will retry automatically once the workflow ships.

What this PR does NOT fix

The currently-stale blog.png on develop. That's a downstream artifact of the bug and needs a separate chore(docs): PR (or a successful blog content push under the fixed workflow). PR #2559 will stay blocked on its [blog] FAIL until that lands.

Test plan

  • PR's own visual-regression job passes (no blog content change → loop never runs, same as before).
  • After merge: next blog content push to develop shows the loop in action; if the push succeeds first try, log line reads Pushed baseline refresh on attempt 1.

🤖 Generated with Claude Code

The "Commit pre-refreshed blog baseline" step in web-deploy.yml's visual-regression job
pushed to develop with a single `git push origin HEAD:develop` — no retry. When develop
moved during the ~3 min job runtime (e.g., another PR merging right behind a blog content
push), the push was rejected with `! [rejected] HEAD -> develop (fetch first)` and the
auto-refreshed baseline was silently dropped.

Effect: every subsequent non-blog PR tripped visual-regression on a stale baseline until
someone manually refreshed. Concrete trigger this week: PRs #2549 + #2551 merging
back-to-back left blog.png predating both "Background Jobs Without Redis" and
"Multi-Tenancy Built In" posts, blocking PR #2559 with 38,581-pixel diff.

Fix: wrap the push in a 5-attempt fetch+rebase+push loop. Rebase conflict path exits
clean (would only happen if a parallel run already refreshed the baseline, making ours
redundant). Push-rejected path retries with a short randomized backoff.

See run 25647803649 for the failure mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@wheels-bot wheels-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wheels Bot — Reviewer A

TL;DR: This PR correctly identifies and fixes a real race condition in the blog baseline auto-push: a single git push with no retry was silently dropped whenever develop moved during the ~3-minute job, leaving blog.png stale and tripping unrelated PRs. The fetch+rebase loop is a clean, idiomatic fix. Two minor nits below; neither is a correctness issue.


Correctness

The loop logic is sound across all cases:

  • Push rejected (develop advanced): git fetch + git rebase + retry correctly replays the single baseline commit on top of the new develop HEAD. Subsequent rebase attempts chain correctly — after rebasing onto D1, the next iteration rebases onto D2 (if develop moved again), because the merge base between the already-rebased commit and the new HEAD is always the previous origin/develop.
  • Rebase conflict (parallel job already refreshed blog.png): git rebase --abort + exit 0 is the right call — the baseline is already fresh on develop.
  • 5 failures: exit 1 surfaces the failure visibly, and the next blog-content push retriggers the job.
  • PR event (not push): steps.scope only runs on push, so blog_content_changed is empty on PRs and the step is gated out entirely. No regression risk.

Minor nit 1 — unnecessary sleep on final attempt.

.github/workflows/web-deploy.yml, loop body (the sleep line is inside the for body, unconditional):

for attempt in 1 2 3 4 5; do
  ...
  echo "Push rejected on attempt $attempt; will fetch + rebase + retry."
  sleep $((RANDOM % 5 + 2))   # <-- runs even when attempt == 5
done
echo "Failed to push baseline refresh after 5 attempts — ..."
exit 1

On the fifth rejected push, the script sleeps 2–6 seconds before printing the failure message and exiting. This is harmless in practice, but the sleep serves no purpose when there is no retry following it. Consider:

if [ "$attempt" -lt 5 ]; then
  sleep $((RANDOM % 5 + 2))
fi

Minor nit 2 — git rebase --abort is subject to set -e.

Inside the if ! git rebase origin/develop; then block, git rebase --abort runs under set -euo pipefail. If --abort itself were to fail (e.g., nothing to abort because the rebase state was already cleaned up), the step would exit 1 rather than the intended exit 0. This is a vanishingly unlikely edge case, but:

if ! git rebase origin/develop; then
  git rebase --abort || true    # defensive: --abort failing is still "done"
  echo "..."
  exit 0
fi

Commits

ci: retry blog visual baseline auto-push with fetch+rebase loop — valid type (ci), no scope (correct — CI workflow, not a framework layer), subject under 100 chars, not ALL-CAPS. Passes commitlint.


Tests / Docs

No CFML code changed; no test spec or doc update required.


@wheels-bot
Copy link
Copy Markdown
Contributor

wheels-bot Bot commented May 11, 2026

Wheels Bot -- Reviewer B (round 1)

A's review is accurate and well-reasoned. Both nits are verified against the actual diff; the correctness analysis is sound; no sycophancy. I'm aligned with A's assessment -- the fix is correct and merge-ready with minor cosmetic improvements available.

Sycophancy

None detected. A used COMMENTED rather than APPROVED despite finding the fix correct, which is the right level of restraint for a CI push-path change. The positive framing is earned.

False positives

None detected.

  • Nit 1 (sleep on final attempt): Confirmed. The sleep is inside the for loop body with no guard, so it fires unconditionally including when attempt == 5, immediately before the loop exhausts and falls through to exit 1. A's reading is accurate.
  • Nit 2 (git rebase --abort under set -e): Confirmed. The step explicitly sets set -euo pipefail at the top of its run: block (visible at line 177 of web-deploy.yml). git rebase --abort runs under that flag. A's suggested || true guard is technically valid.

Missed issues

One minor sub-point A didn't raise alongside Nit 1: the echo on the final rejected attempt prints "Push rejected on attempt $attempt; will fetch + rebase + retry." even when attempt == 5 and there is in fact no retry coming. The message is misleading on that iteration. This is cosmetic and paired with the unnecessary sleep -- both would be resolved by the same guard A suggested.

No other missed issues. This is a pure CI loop fix touching one workflow file and one PNG; no CFML code, no cross-engine concerns, no security surface changes.

Verdict alignment

A's COMMENTED verdict is consistent with the findings: two minor nits, no correctness issues, no blocking concerns. Appropriate.

Convergence

Aligned. The fix is correct across all three cases A analyzed (push rejected, rebase conflict, 5 failures), and both nits are minor non-blocking cosmetics. Joint recommendation: this PR is merge-ready as-is; addressing Nit 1 (sleep guard + misleading echo on final attempt) and Nit 2 (|| true) would be a small improvement but is not required before merge.

bpamiri and others added 2 commits May 10, 2026 21:54
Two pieces, both fallout from the race-condition bug this PR fixes:

1. Refresh `web/tests/visual-baselines/blog.png`. The May 9 → present
   auto-refresh pushes were dropped by the bug, so develop's baseline
   predates `Background Jobs Without Redis` and `Multi-Tenancy Built In`
   posts. Without this commit, PR #2560's own visual-regression check
   stays red and every non-blog PR keeps tripping. Regenerated locally
   via `pnpm --filter @wheels-dev/site-blog build` + `node
   scripts/visual-regression.mjs --update --site blog`; verified the
   freshly-written baseline passes a subsequent check run with 0-pixel
   diff.

2. Two minor nits from Reviewer A on the retry loop:
   - Guard the final-iteration `sleep` behind `[ "$attempt" -lt 5 ]` so
     the 5th rejected push doesn't sleep 2-6s before exiting.
   - Add `|| true` to `git rebase --abort` so a failed abort (e.g.,
     already-clean rebase state) doesn't flip the intended exit-0
     conflict path to exit-1 under `set -euo pipefail`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer B noted that the "Push rejected on attempt $attempt; will fetch + rebase + retry."
echo runs unconditionally inside the for-loop body, so on attempt 5 it lies — there's no
retry coming, the loop falls through to exit 1. Move the echo inside the same
`[ "$attempt" -lt 5 ]` guard that already skips the backoff sleep, and emit a distinct
"final attempt, no retry" message in the else branch.

Pure cosmetic; no control-flow change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@wheels-bot wheels-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wheels Bot — Reviewer A

TL;DR: This follow-on commit addresses both nits from the previous review exactly as suggested — no new issues introduced. The blog.png refresh is the right accompanying change: it unblocks the PR's own visual-regression check and removes the downstream block on #2559. Approving.


Correctness

Both nits have been resolved precisely:

Nit 1 — sleep guard: The if [ "$attempt" -lt 5 ] check now correctly skips the 2–6s backoff on the final iteration, where no retry follows.

if [ "$attempt" -lt 5 ]; then
  sleep $((RANDOM % 5 + 2))
fi

Nit 2 — git rebase --abort under set -e: The || true guard prevents a spurious exit-1 if --abort itself fails (e.g., already-clean rebase state).

git rebase --abort || true

No new code paths were added; the retry loop logic reviewed in round 1 is unchanged.

Baseline image: The commit message describes regenerating blog.png locally via pnpm --filter @wheels-dev/site-blog build + node scripts/visual-regression.mjs --update --site blog and verifying a 0-pixel-diff pass afterward. This is the correct process and the right bundling point — the stale baseline is a direct artifact of the race-condition bug this PR fixes, and without the refresh the PR's own check stays red.


Commits

Both commits conform to commitlint:

Commit Type Scope Subject
4ff53ab ci none retry blog visual baseline auto-push with fetch+rebase loop
1da1ab8 ci none refresh stale blog baseline and address review nits

Types are in the allowlist; no scope is the correct call for CI workflow changes; subjects are under 100 chars and not ALL-CAPS.


Tests / Docs

No CFML code changed; no spec or .ai/wheels/ update required.

Copy link
Copy Markdown
Contributor

@wheels-bot wheels-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wheels Bot review for acc944f

The third and final commit (acc944f) is a pure cosmetic tightening: the misleading "will retry" echo on the final loop iteration has been moved inside the same guard as the backoff sleep, with an accurate "final attempt, no retry" message in the else branch. No control-flow change. The substantive fix was already approved at 1da1ab8. Approving the updated head.

Correctness: the change does exactly what the commit message claims. Before, attempt 5 printed "will fetch + rebase + retry" then fell through to exit 1 with no retry -- factually wrong. After, the echo is gated on attempt -lt 5, and the else branch correctly says "final attempt, no retry". The retry loop, rebase conflict handling (git rebase --abort || true), and the sleep guard from the previous round are all unchanged.

Commits: all three conform to commitlint. Types (ci) from allowlist; no scope is correct for CI workflow changes; subjects under 100 chars, none ALL-CAPS.

Tests / Docs: no CFML code changed; no spec or .ai/wheels/ update required.

wheels-bot-review-a-2560-acc944f771472de4d3959a83250dd29da14ff5d6

@wheels-bot
Copy link
Copy Markdown
Contributor

wheels-bot Bot commented May 11, 2026

No description provided.

@wheels-bot
Copy link
Copy Markdown
Contributor

wheels-bot Bot commented May 11, 2026

Wheels Bot — Reviewer B (round 1)

A's review is accurate. The two nits from round 1 are verified as fixed in the diff, and A's APPROVED verdict is earned. One small documentation gap: A's commit table lists two commits but the PR has three. The third (acc944f77ci: don't print misleading "will retry" message on final attempt) is omitted from the table. Its changes are implicitly described in A's prose for Nit 1, and the commit itself conforms to commitlint (correct type, no scope, under 100 chars), so this is cosmetic only.

Sycophancy

None detected. A approved a CI push-path change with concrete line-level verification — not a bare LGTM.

False positives

None detected.

  • Nit 1 (sleep guard): Confirmed in diff. The if [ "$attempt" -lt 5 ] block correctly gates both the sleep and the non-final echo. A's description is accurate.
  • Nit 2 (git rebase --abort || true): Confirmed in diff. The guard is present. A's description is accurate.

Missed issues

One documentation gap: A's commit table lists two commits (4ff53ab, 1da1ab8) but the PR contains three. The third commit is acc944f77 (ci: don't print misleading "will retry" message on final attempt). Its changes (the else branch with the corrected final-attempt echo message) are visible in the diff and described in A's prose, just not captured in the table. The commit conforms to commitlint; no correctness or scope concern.

No other missed issues. This is a CI-only change (one workflow file, one binary PNG) with no CFML code, no cross-engine concerns, and no security surface.

Verdict alignment

A's APPROVED verdict is consistent with the findings: both nits resolved, no new code paths, no blocking concerns.

Convergence

Aligned. The fix is correct, both nits are addressed, and the third commit is valid. Joint recommendation: merge-ready as-is.

@wheels-bot
Copy link
Copy Markdown
Contributor

wheels-bot Bot commented May 11, 2026

Wheels Bot -- Reviewer B (round 1)

A's review is accurate. The commit is exactly the cosmetic fix described, both nits from round 1 are resolved, and the APPROVED verdict is appropriate for a purely cosmetic change on top of an already-approved substantive fix.

Sycophancy

None detected. A approved with concrete reasoning -- citing the specific change (echo moved inside guard, accurate else-branch message) and explicitly confirming no control-flow change. The positive framing is earned.

False positives

None detected.

  • "Pure cosmetic tightening, no control-flow change": Confirmed. The if [ "$attempt" -lt 5 ] guard now wraps both the echo and the sleep. The rebase-conflict path, retry count, and exit conditions are identical to 1da1ab8.
  • Accurate else-branch message: Confirmed. The else branch emits "final attempt, no retry" which is factually correct at attempt 5.
  • Commitlint conformance: Confirmed. All three commits use ci: type, no scope (correct for .github/workflows/ per CLAUDE.md), subjects under 100 chars, none ALL-CAPS.

Missed issues

None detected. This diff touches one workflow file (shell loop cosmetics) and one PNG. No CFML code, no cross-engine concerns, no security surface. B's minor sub-point from round 1 (misleading echo on final attempt) is exactly what acc944f addressed.

Verdict alignment

A's APPROVED verdict is consistent with the findings: this is a cosmetic-only commit fixing the last nit B raised in round 1, with no blocking concerns.

Convergence

Aligned. Both nits from round 1 (sleep-on-final-attempt and misleading echo) are resolved across 1da1ab8 and acc944f. A's APPROVED verdict is correct. Joint recommendation: merge-ready.

@bpamiri bpamiri merged commit 1b40ee9 into develop May 11, 2026
12 of 13 checks passed
@bpamiri bpamiri deleted the peter/fix-blog-baseline-race branch May 11, 2026 05:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant