Skip to content

[codex] Apply generated cpflow GitHub Actions flow#732

Draft
justin808 wants to merge 10 commits intomasterfrom
codex/cpflow-github-actions-flow
Draft

[codex] Apply generated cpflow GitHub Actions flow#732
justin808 wants to merge 10 commits intomasterfrom
codex/cpflow-github-actions-flow

Conversation

@justin808
Copy link
Copy Markdown
Member

@justin808 justin808 commented Apr 30, 2026

Summary

  • Replace the older hand-written Control Plane GitHub Actions with the generated cpflow-* action and workflow set from cpflow generate-github-actions.
  • Keep review apps opt-in, staging deploys branch-gated, production promotion manual, and nightly review-app cleanup in the generated flow.
  • Update Control Plane docs links so they point at the new cpflow-* action/workflow names.

Verification

  • BUNDLE_GEMFILE=/tmp/cpflow-pr-278/Gemfile bin/conductor-exec bundle exec ruby /tmp/cpflow-pr-278/bin/cpflow github-flow-readiness — no blocking readiness issues detected.
  • bin/conductor-exec ruby -e 'require "yaml"; paths = Dir[".github/**/*.yml"] + Dir[".github/**/*.yaml"] + Dir[".controlplane/**/*.yml"]; paths.sort.each { |path| YAML.load_file(path, aliases: true); puts path }'
  • bin/conductor-exec git diff --cached --check
  • bin/conductor-exec bundle exec rubocop

Not Run / Blocked

  • bin/conductor-exec bundle exec rspec could not complete locally because PostgreSQL was not running at /tmp/.s.PGSQL.5432; after the first Rails load failure, Coveralls setup also reported repeated coverage initialization errors.

Note

Medium Risk
Replaces core deployment automation (review apps, staging deploys, production promotion) and introduces new secret/variable handling plus deletion safeguards, so misconfiguration could block deploys or affect environments.

Overview
Migrates Control Plane CI/CD from the older hand-written GitHub Actions to the generated cpflow-* composite actions and workflows, covering review-app deploy/delete, staging deploys, production promotion, and nightly stale review-app cleanup.

The new flow makes review apps opt-in via the exact /deploy-review-app comment (with auto-redeploy on new commits once created), adds a /delete-review-app path plus PR-close cleanup, and gates fork PR behavior to avoid secret exposure.

Updates the build/deploy plumbing with configurable @controlplane/cli + cpflow versions, optional Docker build SSH/extra args, safer token handling, a release-phase auto-detect flag, and stronger deletion safety via REVIEW_APP_PREFIX checks; docs are updated to match the new workflow names and required repo secrets/variables.

Reviewed by Cursor Bugbot for commit fee6ce9. Bugbot is set up for automated code reviews on this repo. Configure here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bc374662-c575-4231-a84c-d951251043fd

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/cpflow-github-actions-flow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Upstream token not passed to copy-image command
    • Added the required -t upstream token argument to the cpflow copy-image-from-upstream command while keeping the token sourced from secrets via the environment.

Create PR

Or push these changes by commenting:

@cursor push 3606e16506
Preview (3606e16506)
diff --git a/.github/workflows/cpflow-promote-staging-to-production.yml b/.github/workflows/cpflow-promote-staging-to-production.yml
--- a/.github/workflows/cpflow-promote-staging-to-production.yml
+++ b/.github/workflows/cpflow-promote-staging-to-production.yml
@@ -203,14 +203,13 @@
 
       - name: Copy image from staging
         env:
-          # Pass the upstream token via env rather than `-t` so it doesn't appear in /proc/<pid>/cmdline.
           CPLN_UPSTREAM_TOKEN: ${{ secrets.CPLN_TOKEN_STAGING }}
           PRODUCTION_APP_NAME: ${{ vars.PRODUCTION_APP_NAME }}
           CPLN_ORG_PRODUCTION: ${{ vars.CPLN_ORG_PRODUCTION }}
         shell: bash
         run: |
           set -euo pipefail
-          cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" --org "${CPLN_ORG_PRODUCTION}"
+          cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" -t "${CPLN_UPSTREAM_TOKEN}" --org "${CPLN_ORG_PRODUCTION}"
 
       - name: Deploy image to production
         env:

You can send follow-ups to the cloud agent here.

shell: bash
run: |
set -euo pipefail
cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" --org "${CPLN_ORG_PRODUCTION}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upstream token not passed to copy-image command

High Severity

The copy-image-from-upstream step sets CPLN_UPSTREAM_TOKEN as an environment variable but never passes the upstream token to the cpflow command. The cpflow documentation states this command requires --upstream-token or -t flag — there is no documented support for a CPLN_UPSTREAM_TOKEN env var in cpflow 4.2.0 (the version installed by cpflow-setup-environment). The old workflow correctly used -t ${{ secrets.CPLN_TOKEN_STAGING }}. Without the flag, the command cannot authenticate with the staging org and production promotion will fail at the image copy step.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit eb1dbb2. Configure here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this unchanged. cpflow copy-image-from-upstream in cpflow 4.2.0 supports CPLN_UPSTREAM_TOKEN as an env var, so passing -t here would reintroduce the token-in-process-argv exposure this generated flow is intentionally avoiding. Reference: https://github.com/shakacode/control-plane-flow/blob/add-github-flow-generator/lib/command/copy_image_from_upstream.rb#L34-L36

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review: cpflow GitHub Actions Migration

This is a solid migration from hand-written to generated cpflow-* workflows. The overall architecture is a clear improvement. Here's a summary of what stands out.


Security — meaningful fixes

  • Token exposure fixed: The old setup-environment action passed the token as --token "$TOKEN" (visible in /proc/<pid>/cmdline and ps). The new action correctly uses CPLN_TOKEN env var. Same for CPLN_UPSTREAM_TOKEN in the promote workflow.
  • Critical: /deploy-review-app now requires author_association. The old workflow had no author_association check, meaning any GitHub user who could comment on a PR could trigger a deployment. The new check ("OWNER","MEMBER","COLLABORATOR") correctly gates this.
  • Exact command match (== '/deploy-review-app') replaces contains(). Prevents accidental triggers from comments like "I ran /deploy-review-app earlier and it worked".
  • SSH key isolation: The cpflow-build-docker-image action scopes DOCKER_BUILD_SSH_KEY to a dedicated step so it never appears in the build step's environment — even when ACTIONS_STEP_DEBUG=true is set. The EXIT trap cleans up the key file and kills the agent. Well designed.
  • Fork PR handling: Fork PRs correctly skip auto-deploy (which can't access secrets) while still allowing approved collaborators to deploy via /deploy-review-app. The cpflow-review-app-help.yml is explicit about why pull_request_target is used there (no code checkout = safe for write access from forks).

Bugs fixed

  • cancel-in-progress: false on all deploy workflows. The old staging workflow used cancel-in-progress: true, which could interrupt a cpflow deploy-image mid-rollout and leave the workload partially updated. Now fixed across all three deployment workflows.
  • Delete on PR close now uses pull_request_target. The old delete-review-app.yml used pull_request: types: [closed], which runs in a restricted context without secret access when a fork PR closes. The new workflow correctly uses pull_request_target for the close event.
  • --run-release-phase is now conditional. The old staging deploy hardcoded it unconditionally; the new workflows detect release_script: presence at runtime before deciding.
  • Delete safety uses prefix allowlist, not blocklist. The old delete-app.sh used grep -iqE '(production|staging)' to block "dangerous" names — fragile against future app naming. The new delete-app.sh requires the app name to start with REVIEW_APP_PREFIX, which is a stronger guarantee.

Potential issues

  1. cpflow exists in the deploy workflow doesn't distinguish auth errors from "not found" (see inline comment). The delete-app.sh carefully pattern-matches error output to surface API/auth failures, but the deploy workflow's check-app step treats all non-zero exits as exists=false. An intermittent auth error on a push event would silently skip the deploy.

  2. staging deploy job has no needs: validate-branch: The deploy job correctly declares needs: [validate-branch, build] and has an if: guard, but if validate-branch produces no output (e.g., first run where the variable step fails), the if: could evaluate against an empty string. In practice the validation step exits non-zero, so build never runs and deploy is skipped by the needs: dependency. This is fine but the if: guard is slightly redundant with the needs: dependency on validate-branch.

  3. Help command has no contents: read permission: cpflow-help-command.yml declares only issues: write and pull-requests: write. There's no explicit contents: read but GitHub's default GITHUB_TOKEN grants read on public repos, so this works — just worth noting that it relies on the repo's default permissions.


Nits

  • cpflow-deploy-staging.yml uses GITHUB_REF_NAME (simpler), whereas the old workflow used ${GITHUB_REF#refs/heads/} (strip-prefix approach that would silently behave oddly for tag refs). Small but cleaner.
  • The shakacode-team.md link fix (-info org URL corrected to shakacode-open-source-examples-production) is a separate correctness fix bundled in cleanly.

Overall this is well-structured with meaningful security and reliability improvements. The one item I'd prioritize is the cpflow exists error handling consistency noted in the inline comment.

if cpflow exists -a "${APP_NAME}" --org "${CPLN_ORG}"; then
echo "exists=true" >> "$GITHUB_OUTPUT"
else
echo "exists=false" >> "$GITHUB_OUTPUT"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if/else around cpflow exists conflates two different failure modes: "app does not exist" (expected non-zero) and "API/auth error" (unexpected non-zero). Both land in the else branch and set exists=false.

Compare delete-app.sh in this same PR, which carefully pattern-matches the output ("Double check your org", "Unknown API token format", "ERROR", etc.) to distinguish a real failure from a genuine not-found and surface it as a hard error.

If there's a transient auth issue when a PR is pushed, this step sets exists=false and the pull_request path silently writes a step-summary saying the app doesn't exist yet — masking the real problem until the deploy is manually retried.

Suggested approach: mirror delete-app.sh's pattern — capture the output, check exit code, inspect the text for known error tokens, and exit non-zero on a real failure rather than treating it as "not found".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b9095c6 by capturing cpflow exists output and failing hard on auth/API error tokens instead of treating every non-zero exit as app-not-found.


if [[ -f "${HOME}/.ssh/cpflow_build_key" ]]; then
eval "$(ssh-agent -s)"
trap 'ssh-agent -k >/dev/null; rm -f "${HOME}/.ssh/cpflow_build_key"' EXIT
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EXIT trap kills the SSH agent and removes cpflow_build_key, but the ~/.ssh/known_hosts file written in the previous step is not cleaned up here. On ephemeral GitHub-hosted runners this is harmless (the VM is discarded), but if this action is ever adopted on self-hosted runners with a persistent ~/.ssh, the custom known_hosts entries would accumulate across runs.

Consider adding rm -f "${HOME}/.ssh/known_hosts" to the trap, or using a temp file under $RUNNER_TEMP instead of ~/.ssh for the known_hosts.

default: "3.3.1"
cpflow_version:
description: cpflow gem version
required: false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI version is pinned here as a default, which is good for reproducibility. Worth noting: since sudo npm install -g fetches from the npm registry at runtime, a compromised release of @controlplane/cli@3.3.1 would silently affect all runs. Consider adding a SHA/integrity check (e.g. via npm install --integrity) or at minimum locking to a digest-pinned download, especially since this step runs in workflows that access staging/production tokens.


current_image="${selected_image:-${first_image}}"
current_version="${selected_version:-${first_version}}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The staging env-var parity check only compares names, not values. It will catch a missing DATABASE_URL in production, but it won't catch a case where production has the right key names but wrong or stale values (e.g. a staging secret was rotated but the production counterpart wasn't updated). This is expected behaviour for a name-only check — just worth documenting explicitly so it's clear what the check does and doesn't guarantee.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented in b9095c6: the production promotion workflow checks that production has the same environment variable names as staging, but it does not compare secret values.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review: Replace hand-written CI/CD Actions with generated cpflow- flow*

Well-considered overhaul that hardens security, safety, and operational reliability. No blocking issues found.

Security improvements

Token no longer passed via --token flag — old setup-environment exposed it in /proc/cmdline and ps; now uses CPLN_TOKEN env var only. SSH key scoped to a dedicated step so ACTIONS_STEP_DEBUG cannot leak it from the build step. Delete command now gated on author_association (old workflow had no permission check). Fork PR protection blocks syncs before any secret is consumed. Old contains() command matching replaced with exact match.

Safety improvements

cancel-in-progress changed from true to false everywhere — old staging deploy risked partial GVC deployments mid-rollout. Prefix-based delete guard replaces the old substring check for production/staging names. Production rollback logic added with per-workload state capture and restore — old workflow had no rollback at all. Health checks now configurable (HEALTH_CHECK_RETRIES, HEALTH_CHECK_INTERVAL, HEALTH_CHECK_ACCEPTED_STATUSES).

Issue 1: Fork-PR /deploy-review-app comment silently fails with no PR feedback

When a collaborator comments /deploy-review-app on a fork PR, the job condition passes but Validate review app deployment source exits 1 with only a runner log message. No PR comment explains the rejection. Suggest posting an issues.createComment before exit 1, similar to how the unconfigured-secrets path writes to GITHUB_STEP_SUMMARY.

Issue 2: cpflow exists error detection is string-matching fragile

delete-app.sh uses case matching over stderr to distinguish app-not-found from auth/network errors. The inline TODO acknowledges this. If cpflow changes an error message, the pattern will not match and a real failure will be silently treated as app not found, causing delete to return success vacuously. The TODO to switch to a structured exit code is the right long-term fix.

Issue 3: gem install / npm install without integrity pinning

cpflow-setup-environment/action.yml installs @controlplane/cli@3.3.1 and cpflow 4.2.0 without hash verification. A re-tagged or compromised release would install silently. For production-adjacent CI, consider verifying SHA256 after install or using lockfiles with integrity entries.

Issue 4: Nightly cleanup has no failure notification

cpflow-cleanup-stale-review-apps.yml runs on cron with no failure notification. Silent failures accumulate stale apps over days. Consider a step gated on if: failure() that posts a GitHub issue or annotation.

Minor observations

APP_NAME / PR_NUMBER double-assignment is intentional (workflow-level env needed early for concurrency.group, step overwrites with resolved values) but a short comment would clarify intent. cpflow-review-app-help.yml using pull_request_target is correct and well-documented. deploy job in cpflow-deploy-staging.yml re-runs cpflow-setup-environment unnecessarily (only needs cpflow deploy-image) — minor redundancy, acceptable for simplicity.

echo "🗑️ Deleting application: $APP_NAME"
cpflow delete -a "$APP_NAME" --org "$CPLN_ORG" --yes

echo "✅ Successfully deleted application: $APP_NAME"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case pattern matching over stderr is the right approach given cpflow exists doesn't expose a structured signal, and the inline TODO is the right way to track this. One additional defensive measure worth considering: log the full exists_output to the runner even when the "not found" path is taken, so there's an audit trail if unexpected output silently passed through the filter. Right now the output is printed only if non-empty, but it's easy to miss in logs.

Also worth noting: the pattern list doesn't include "timeout", "connection refused", or "ENOTFOUND" — all plausible network failure messages. If cpflow's Ruby HTTP client times out, it may surface Errno::ECONNREFUSED or Net::ReadTimeout, which would match "Net::", so that's covered. But curl-style timeouts would not. Low risk given this is a Ruby gem, but worth keeping in mind when bumping the cpflow version.

shell: bash
run: |
set -euo pipefail
cpflow cleanup-stale-apps -a "${{ vars.REVIEW_APP_PREFIX }}" --org "${{ vars.CPLN_ORG_STAGING }}" --yes
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No failure notification exists for this cron job. If cpflow cleanup-stale-apps fails (auth error, API hiccup, etc.), stale review apps accumulate silently across days.

Consider appending a step:

      - name: Notify on failure
        if: failure()
        run: |
          echo "::error::Stale review app cleanup failed. Check the workflow logs."

Or, if the team monitors GitHub issues, use gh issue create to open a tracking issue on failure so it's not lost in workflow history.

} >> "$GITHUB_STEP_SUMMARY"
exit 0
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a collaborator comments /deploy-review-app on a fork PR, execution reaches this exit 1 with only a runner log message — no feedback is posted to the PR thread. The collaborator sees a red workflow run with no explanation in the PR.

Consider posting a comment before exiting. The GH_TOKEN is available from the Resolve PR ref and commit step but not inherited here; you'd need to add GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} to this step's env block. Something like:

gh api "repos/${GITHUB_REPOSITORY}/issues/${PR_NUMBER}/comments" \
  --method POST \
  --field body="Fork PRs cannot use /deploy-review-app because this workflow uses repository secrets. Ask a maintainer to deploy from a branch in the base repository." \
  --silent || true

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1e46bfe: fork PR /deploy-review-app rejections now post a PR comment before exiting, while still keeping the deployment blocked because the workflow uses repository secrets.

run: |
set -euo pipefail

sudo npm install -g @controlplane/cli@${{ inputs.cpln_cli_version }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version-pinned but not hash-verified. A malicious actor who gains write access to the @controlplane/cli npm package (or re-tags a release) could substitute a compromised binary. Same applies to gem install cpflow below.

For production-adjacent pipelines, consider adding a post-install checksum check, or at minimum document the expected SHA256 for each pinned version in a comment so drift is detectable during future version bumps.

run: |
# Best-effort rollback: try every workload, aggregate failures, exit non-zero at the end
# if any failed. A single cpln hiccup shouldn't leave other workloads mid-promotion.
set -uo pipefail
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intentional omission of -e here (using set -uo pipefail instead of set -euo pipefail) is the correct design — it lets rollback continue across all workloads even if one cpln call fails, with rollback_failures tracking the aggregate. Worth a brief comment explaining the omission so a future reader doesn't add -e thinking it was forgotten:

Suggested change
set -uo pipefail
# Intentionally no -e: we want every workload to attempt rollback even if earlier ones fail.
set -uo pipefail

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1e46bfe with a comment explaining that set -e is intentionally omitted so rollback can attempt every workload and aggregate failures.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Token not persisted for subsequent workflow steps
    • The composite action now writes CPLN_TOKEN to GITHUB_ENV so subsequent cpflow and cpln workflow steps inherit authentication without passing the token on the command line.

Create PR

Or push these changes by commenting:

@cursor push f1eee564c7
Preview (f1eee564c7)
diff --git a/.github/actions/cpflow-setup-environment/action.yml b/.github/actions/cpflow-setup-environment/action.yml
--- a/.github/actions/cpflow-setup-environment/action.yml
+++ b/.github/actions/cpflow-setup-environment/action.yml
@@ -63,6 +63,12 @@
           exit 1
         fi
 
+        {
+          echo "CPLN_TOKEN<<__CPLN_TOKEN__"
+          echo "$CPLN_TOKEN"
+          echo "__CPLN_TOKEN__"
+        } >> "$GITHUB_ENV"
+
         create_output=""
         if ! create_output="$(cpln profile create default --org "$ORG" 2>&1)"; then
           if ! echo "$create_output" | grep -qi "already exists"; then

You can send follow-ups to the cloud agent here.

Comment thread .github/actions/cpflow-setup-environment/action.yml
@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review — PR #732: Generated cpflow GitHub Actions Flow

Overview

This is a well-structured replacement of hand-written CI/CD workflows with the generated cpflow-* composite actions and workflows. The new flows are meaningfully safer and more correct than the old ones across several dimensions. The PR is in draft but the code quality is high enough to call out specifics.


Significant Improvements ✅

  • Secret handling: Old code passed CPLN_TOKEN via --token (visible in /proc/<pid>/cmdline and ps output). New code passes it exclusively via environment — good security hygiene across all three cpln/cpflow call sites.
  • Prefix guard on deletion: delete-app.sh now validates that the app name starts with $REVIEW_APP_PREFIX- before deleting, which is a much stronger guard than the old "reject names containing 'production' or 'staging'" heuristic. The old approach would have blocked deletion of any app with those words in the name (too broad) while leaving other non-review apps unprotected (too narrow).
  • Fork PR protection: The deploy workflow explicitly detects fork PRs and blocks them from accessing repository secrets — the old workflow had no such guard.
  • Author association gating: Slash commands now check author_association is OWNER, MEMBER, or COLLABORATOR. The old delete workflow had no collaborator check at all.
  • Non-cancelling concurrency (cancel-in-progress: false): Prevents partial deployments and half-deleted apps — this was cancel-in-progress: true in the old staging workflow.
  • Multi-workload rollback: The promote workflow rolls back all workloads independently on failure, aggregating errors rather than stopping at the first failure.
  • Env-var parity check: Pre-promotion check that production has all staging env var names prevents silent nil variable errors after promotion.
  • Health checks with configurable accepted statuses: The default 200 301 302 is well-reasoned for auth-gated Rails apps; HEALTH_CHECK_ACCEPTED_STATUSES makes it overridable without editing the file.
  • Randomized heredoc delimiter in capture-current: EOF_$(openssl rand -hex 8) correctly prevents heredoc injection if rollback state ever contained a literal EOF line.

Issues

Bug: deploy job may run even when build fails in cpflow-deploy-staging.yml

In GitHub Actions, an explicit if: on a job replaces the implicit success() guard. The deploy job condition is:

if: needs.validate-branch.outputs.is_deployable == 'true'

Because this doesn't check needs.build.result, the deploy job will run even if the build job fails (as long as the branch is deployable). This burns runner minutes and produces a confusing deploy attempt against a missing image.

Fix: add && needs.build.result == 'success' to the condition.

Risk: String-matching on cpflow exists error output is fragile

delete-app.sh (and the inline version in cpflow-deploy-review-app.yml) detect "real" errors by checking for specific strings like "Double check your org", "ERROR", "Traceback", "Net::". The file already has a # TODO comment acknowledging this, but the consequence of missing a new error pattern is silently treating a real failure as "app not found" — which could skip deletion without any alert.

This is acceptable given the stated cpflow limitation, but the heuristic string list should be kept in sync with cpflow's actual output. An alternative interim approach: treat any non-zero exit with non-empty stderr as a failure unless it matches a known "not found" pattern, rather than enumerating known error patterns.

Minor: Workflow-level APP_NAME env var is redundant and overridden in cpflow-deploy-review-app.yml

The top-level env: block in the deploy review app workflow sets APP_NAME from the raw event payload, but the Resolve PR ref and commit step later overwrites it in $GITHUB_ENV with the normalized PR number. The step-level value wins, making the workflow-level declaration dead configuration. It's also the only workflow in this set that has this pattern — the others don't duplicate APP_NAME this way.


Nitpicks

  • cpflow-setup-environment/action.yml: gem install cpflow -v ... without --no-document is slower than needed on CI runners.
  • cpflow-cleanup-stale-review-apps.yml: The Validate required secrets and variables step runs as a plain shell: bash step but has no id: — this is fine, but if you ever want to condition a downstream step on it, you'll need to add one.
  • The old nightly-remove-stale-review-apps.yml hardcoded the app name (qa-react-webpack-rails-tutorial) rather than using ${{ vars.REVIEW_APP_PREFIX }}. The new workflow correctly uses the variable — worth calling out as a subtle bug fix.

Summary

The core logic is solid and the security posture is materially improved. The one real bug (deploy running after build failure) should be fixed before merge. The string-matching fragility is an acknowledged limitation with a TODO already in place.

fi
deploy_args+=(--org "${CPLN_ORG}" --verbose)

cpflow deploy-image "${deploy_args[@]}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: deploy runs even when build fails.

When a job has an explicit if:, GitHub Actions drops the implicit success() guard. Because this condition only checks validate-branch's output, the deploy job will start even if the build job failed, wasting runner minutes and attempting to deploy a missing image.

Suggested change
cpflow deploy-image "${deploy_args[@]}"
if: needs.validate-branch.outputs.is_deployable == 'true' && needs.build.result == 'success'

echo "🗑️ Deleting application: $APP_NAME"
cpflow delete -a "$APP_NAME" --org "$CPLN_ORG" --yes

echo "✅ Successfully deleted application: $APP_NAME"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string-matching heuristic on cpflow exists error output (lines 43–54) is the main reliability risk in this file. The consequence of an unmatched error pattern is silently exiting 0 ("app not found") when the API actually returned an error, meaning the delete is skipped without any visible failure.

An alternative that inverts the logic: treat any non-zero exit with non-empty output as a failure unless it matches a known "not found" pattern:

# instead of matching known error strings, match known "not found" patterns
case "$exists_output" in
  *"not found"*|*"does not exist"*)
    echo "⚠️ Application does not exist: $APP_NAME"
    exit 0
    ;;
  *)
    echo "❌ ERROR: failed to determine whether application exists" >&2
    printf '%s\n' "$exists_output" >&2
    exit 1
    ;;
esac

This is safer by default — unknown output is treated as an error rather than a no-op. The tradeoff is that you'd need to know the exact "not found" wording from cpflow exists, which has the same coupling problem in the other direction. The TODO comment already captures this; just flagging it for awareness.

echo "ready=false" >> "$GITHUB_OUTPUT"
{
echo "Control Plane review app automation is not configured yet."
echo
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This top-level APP_NAME is computed from the raw event payload and is later overwritten in $GITHUB_ENV by the Resolve PR ref and commit step (echo "APP_NAME=..." >> "$GITHUB_ENV"). The step-level write wins, so this declaration is dead configuration and can be confusing. Consider removing it and relying solely on the step-computed value, which is already the authoritative one (it handles all three event types correctly).

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review: cpflow Generated GitHub Actions Flow

This is a meaningful security and reliability upgrade over the handwritten workflows. Several issues from the old setup are fixed here; I've also flagged a few new items worth addressing.


✅ Notable Improvements

  • Token passed via env, not --token flag — prevents leakage in /proc/<pid>/cmdline and ps output (old setup-environment used cpln profile update default --org "$ORG" --token "$TOKEN")
  • Author association gating on /deploy-review-app and /delete-review-app — the old delete workflow accepted the command from any commenter
  • Fork PR protection in the deploy workflow — old workflow ran on all pull_request events without a fork check, allowing fork PRs to attempt using repo secrets
  • Prefix guard in delete-app.sh — prevents accidental deletion of non-review-app resources
  • SSH key isolated to a dedicated step — keeps it out of the main build step's env where ACTIONS_STEP_DEBUG=true would expose it
  • Randomized heredoc delimiters in the promotion workflow — prevents content injection if rollback state contains EOF
  • Old staging workflow fixed — the old deploy-to-control-plane-staging.yml triggered on branches: ['*'] (every push), burning runners on every feature branch; new workflow correctly filters to main/master
  • cancel-in-progress: false on all deployment concurrency groups — prevents partial rollout state from mid-cancelled deploys

🔴 Issues

1. cpflow-delete-review-app.yml uses pull_request_target without an explanation comment

The delete workflow triggers on pull_request_target: [closed]. This event runs with write permissions in the base repo context and can post PR comments on fork PRs. It's the correct choice here (since no PR code is checked out), but it's a well-known footgun — future maintainers may switch it to pull_request thinking it's "safer" and break fork PR cleanup. cpflow-review-app-help.yml has a good comment explaining this; the delete workflow should have one too.

2. cpflow exists error-pattern matching is duplicated

The same fragile string-matching logic ("Double check your org", "ERROR", "Traceback", "Net::", etc.) appears both in delete-app.sh and inline in the Check if review app exists step of cpflow-deploy-review-app.yml. The delete-app.sh has a TODO explaining the fragility; the workflow copy does not. If cpflow changes its error output, both need updating in sync. Consider either extracting this into a shared shell script, or at minimum adding the same TODO comment in the workflow.


🟡 Warnings

3. CPLN_TOKEN is written to GITHUB_ENV and persists to all subsequent steps

In cpflow-setup-environment/action.yml (lines 67–72), the token is written to GITHUB_ENV so later workflow steps can authenticate. This is necessary for cpflow to work, but it means the token is present in the environment of every subsequent step — including any step that might env | sort or log environment state. Consider documenting this tradeoff in a comment.

4. Production health check accepts 301/302 by default

HEALTH_CHECK_ACCEPTED_STATUSES defaults to '200 301 302'. A redirect to a login page or a broken URL returns a redirect, so the health check passes even if the app is misconfigured post-deploy. The docs note the recommended mitigation (set a dedicated /health endpoint and override to "200"). Consider making the default stricter, or adding a warning to the step summary when a redirect is accepted.

5. Nightly cleanup has no failure notification

cpflow-cleanup-stale-review-apps.yml runs at midnight UTC. If it fails (bad token, cpflow error, etc.), there is no alert — stale apps silently accumulate cost. Consider adding a Slack/email notification on failure, or at minimum ensure the repo has GitHub workflow failure notifications enabled for scheduled runs.


ℹ️ Informational

6. /help command has no author_association check

cpflow-help-command.yml responds to /help from any commenter (no association guard). This is probably intentional — help is informational — but on a high-volume repo the bot will reply to every /help comment from any user. Noting for awareness.

7. ruby/setup-ruby@v1 and actions/github-script@v7 use floating major versions

These will pick up any non-breaking updates automatically, which is generally fine. However, actions/checkout@v4 is also a floating major — all three could be pinned to a specific SHA for a more locked-down supply chain posture if that's a concern.

name: Delete Review App

on:
pull_request_target:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull_request_target deserves an explanation comment here.

pull_request_target runs in the context of the base repo (write permissions, access to secrets) and is required for fork PR cleanup — pull_request can't post comments on fork PRs because the token is read-only. This is safe here because no PR code is ever checked out (the actions/checkout@v4 below fetches base-branch code only).

cpflow-review-app-help.yml has a good comment explaining this exact pattern. Without a parallel comment here, a future maintainer may "fix" this to pull_request thinking it's safer, silently breaking fork PR cleanup.

Suggested change
pull_request_target:
# pull_request_target is intentional: it has write permission to comment on PRs from
# forks, where `pull_request` would be read-only. This is safe because no PR code is
# checked out — the job only calls cpflow with base-repo credentials.
# Do not switch this to `pull_request` or add a checkout of the PR ref without re-evaluating.
pull_request_target:

Comment on lines +66 to +72
# Make the token available to later workflow steps without putting it on argv.
token_delimiter="CPLN_TOKEN_$(openssl rand -hex 8)"
{
echo "CPLN_TOKEN<<${token_delimiter}"
printf '%s\n' "$CPLN_TOKEN"
echo "${token_delimiter}"
} >> "$GITHUB_ENV"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing CPLN_TOKEN to GITHUB_ENV propagates it to every subsequent step in the calling workflow — not just the steps that explicitly use it. This is a necessary tradeoff (cpflow picks it up from the environment automatically), but it's worth documenting so future maintainers don't inadvertently add steps that log environment state.

Suggested change
# Make the token available to later workflow steps without putting it on argv.
token_delimiter="CPLN_TOKEN_$(openssl rand -hex 8)"
{
echo "CPLN_TOKEN<<${token_delimiter}"
printf '%s\n' "$CPLN_TOKEN"
echo "${token_delimiter}"
} >> "$GITHUB_ENV"
# Persist CPLN_TOKEN to GITHUB_ENV so all subsequent steps can authenticate
# with cpflow/cpln without putting the token on the command line.
# Trade-off: the token is present in every subsequent step's environment.
# Do not add steps that dump env vars (e.g., `env | sort`) after this point.
token_delimiter="CPLN_TOKEN_$(openssl rand -hex 8)"
{
echo "CPLN_TOKEN<<${token_delimiter}"
printf '%s\n' "$CPLN_TOKEN"
echo "${token_delimiter}"
} >> "$GITHUB_ENV"

Comment on lines +28 to +35
# TODO: replace this string-matching with a structured signal once `cpflow exists` exposes one
# (e.g. a distinct exit code for "not found" vs. API/auth errors, or `cpflow exists --json`).
# Until then, keep this list in sync if `cpflow exists` starts emitting new error patterns —
# any unmatched error string would otherwise be silently treated as "app not found".
exists_output=""
if ! exists_output="$(cpflow exists -a "$APP_NAME" --org "$CPLN_ORG" 2>&1)"; then
case "$exists_output" in
*"Double check your org"*|*"Unknown API token format"*|*"ERROR"*|*"Error:"*|*"Traceback"*|*"Net::"*)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same cpflow exists error-pattern matching appears verbatim in the Check if review app exists step in cpflow-deploy-review-app.yml (line ~195). The TODO comment here is accurate but only covers this file — the workflow copy is uncovered.

If cpflow adds a new error pattern (e.g. "rate limit" or "timeout"), both locations need updating in sync. Consider extracting this into a shared script (e.g. .github/scripts/cpflow-app-exists.sh) that both the action and the workflow call, so the pattern list is maintained in one place.

# repo variable to tighten this for apps that expose a dedicated health endpoint
# (e.g. "200" for a plain /health, or "200 401 403" for apps that auth-gate / without
# redirecting).
HEALTH_CHECK_ACCEPTED_STATUSES: ${{ vars.HEALTH_CHECK_ACCEPTED_STATUSES || '200 301 302' }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting 301/302 as "healthy" means a redirect to a login page, maintenance page, or even a misconfigured CDN passes the health check. The comment correctly explains why (curl without -L, Rails auth-gating /), but a redirect can pass even when the destination is broken.

Consider adding a step summary warning when a non-200 status is accepted, so operators reviewing the promotion log have visibility:

if [[ "${http_status}" == "${accepted}" ]]; then
  if [[ "${http_status}" != "200" ]]; then
    echo "::warning::Health check passed with HTTP ${http_status} (not 200) — verify the redirect destination is healthy."
  fi
  echo "healthy=true" >> "$GITHUB_OUTPUT"
  exit 0
fi

Also worth documenting in .controlplane/shakacode-team.md that setting HEALTH_CHECK_ACCEPTED_STATUSES=200 and adding a /health endpoint is the recommended production posture.

on:
workflow_dispatch:
schedule:
- cron: "0 0 * * *"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scheduled workflows that fail silently are easy to miss — GitHub does send an email for failed scheduled runs but only to the repo owner by default, and only after 7 days of consecutive failures (as of current GitHub behavior).

Consider adding a failure notification step, or at minimum documenting that the team should verify GitHub's "Notify me of failed workflow runs" notification setting is enabled. Stale review apps that aren't cleaned up accrue Control Plane costs.

Suggested change
- cron: "0 0 * * *"
- cron: "0 3 * * *" # 3 AM UTC — avoids exact midnight when many repos run scheduled jobs simultaneously

(The cron timing is a minor suggestion — staggering off midnight reduces GitHub's scheduler contention, making the run more likely to start on time.)

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review: Replace hand-written CI/CD with generated cpflow-* GitHub Actions flow

This PR replaces a set of hand-crafted GitHub Actions workflows with the generated cpflow-* composite action and workflow set. Overall the migration is a meaningful improvement in security, robustness, and maintainability. Below is a detailed assessment.


What Changed

Old New
setup-environment action cpflow-setup-environment action
build-docker-image action cpflow-build-docker-image action
delete-control-plane-app action (no prefix guard) cpflow-delete-control-plane-app action (prefix-guarded)
deploy-to-control-plane-review-app.yml cpflow-deploy-review-app.yml
delete-review-app.yml (pull_request closed) cpflow-delete-review-app.yml (pull_request_target closed)
deploy-to-control-plane-staging.yml (triggered on all branches) cpflow-deploy-staging.yml (main/master only)
nightly-remove-stale-review-apps.yml cpflow-cleanup-stale-review-apps.yml
promote-staging-to-production.yml cpflow-promote-staging-to-production.yml
help-command.yml cpflow-help-command.yml + cpflow-review-app-help.yml

Security Improvements ✅

  • Token never on argv: cpflow-setup-environment passes CPLN_TOKEN via environment variable rather than --token on the command line, avoiding /proc/<pid>/cmdline and ps leakage. The old setup-environment used cpln profile update default --org "$ORG" --token "$TOKEN".
  • SSH key isolation: cpflow-build-docker-image dedicates a separate step for writing the SSH key so DOCKER_BUILD_SSH_KEY is not present in the main build step's environment, even under ACTIONS_STEP_DEBUG=true.
  • Prefix guard on deletes: cpflow-delete-control-plane-app/delete-app.sh checks that APP_NAME starts with REVIEW_APP_PREFIX- before calling cpflow delete. The old action had no such guard.
  • Fork PR gating: cpflow-deploy-review-app.yml explicitly blocks fork PRs on pull_request events (cannot access secrets anyway) and adds author_association checks for comment-triggered deploys. The old workflow triggered on any PR synchronize/reopened without fork checks.
  • pull_request_target used correctly: cpflow-delete-review-app.yml uses pull_request_target for the closed-PR cleanup path (to access secrets from fork PRs) and checks out only the base repo — the comment in the file explains this. This is a well-known footgun that has been handled correctly here.

Robustness Improvements ✅

  • cancel-in-progress: false on all deploy/delete concurrency groups. Mid-rollout cancellation of cpflow deploy-image can leave workloads in a partially-deployed state; letting in-flight runs finish is the right call.
  • Multi-workload rollback: The production promotion workflow now captures per-container image state for all workloads and restores each one on failure, rather than only rolling back the single rails workload image.
  • Randomized heredoc delimiters in cpflow-promote-staging-to-production.yml prevent a stray EOF in a variable value from truncating the multi-line GITHUB_OUTPUT write.
  • --run-release-phase is now opt-in: cpflow-deploy-review-app.yml and cpflow-deploy-staging.yml inspect cpflow config to decide whether to pass --run-release-phase, rather than always passing it. This avoids failures when the flag isn't configured.
  • Health check window is larger: 24 retries × 25s = ~10 min (old: 12 × 10s = ~2 min). More headroom for slow Rails cold boots.

Issues and Suggestions

Medium: Stale cpflow exists error-matching is fragile

delete-app.sh and cpflow-deploy-review-app.yml both parse free-form stderr output from cpflow exists to distinguish "app not found" from authentication/API errors. The comment in the file acknowledges this with a TODO. This approach is inherently brittle if cpflow changes its error output. Consider filing an issue against cpflow to request a structured signal (distinct exit code or --json output) and linking it from the TODO so it doesn't get forgotten.

Low: cpflow-deploy-review-app.ymlsteps.source.outputs.allowed can be empty on workflow_dispatch

The Validate review app deployment source step sets allowed=true for same-repo branches and allowed=false for fork PRs via pull_request events. For workflow_dispatch, steps.resolve-pr.outputs.same_repo could be "false" if the PR is from a fork, which falls into the issue_comment branch of the condition and calls gh pr comment — but workflow_dispatch on a fork PR should perhaps just fail rather than try to comment. Low risk since workflow_dispatch is manual, but the code path is slightly inconsistent.

Low: cpflow-setup-environment — redundant null checks

The Setup Control Plane profile and registry login step validates that CPLN_TOKEN and ORG are non-empty via shell if [[ -z ... ]] guards. These variables are both declared required: true in the action's inputs: block, so Actions will already refuse to call the action without them. The guards add no safety but don't hurt anything.

Low: Old staging workflow triggered on branches: ['*']

The deleted deploy-to-control-plane-staging.yml ran on every push to every branch and relied on a validate-branch job to short-circuit. The new cpflow-deploy-staging.yml correctly restricts the on.push.branches filter to ["main", "master"] with a comment explaining the GitHub limitation on vars in branch filters. This is an improvement, but it means pushes to a non-main/master STAGING_APP_BRANCH will not trigger the workflow unless the file is edited. The comment in the file explains this trade-off clearly — no action needed, just confirming the approach is understood.

Nit: cpflow-cleanup-stale-review-apps.yml — no contents: write permission for release creation

The promote workflow needs contents: write to create GitHub releases; the cleanup workflow does not. Both look correct on permissions.

Nit: cpflow-deploy-review-app.yml line 1011 — long if: conditions

Several steps have near-identical if: conditions like:

if: steps.config.outputs.ready == 'true' && steps.source.outputs.allowed == 'true' && (steps.check-app.outputs.exists == 'true' || steps.setup-review-app.outcome == 'success')

This is repetitive across 10+ steps. A job-level output or an environment variable set once would be cleaner, but this is a style point and not a functional issue in generated code.


Documentation ✅

  • .controlplane/readme.md and shakacode-team.md are updated to reference the new workflow names, slash commands, and REVIEW_APP_PREFIX naming convention.
  • Required/optional repository secrets and variables are now fully documented in shakacode-team.md.
  • The pull_request_target usage in cpflow-delete-review-app.yml and cpflow-review-app-help.yml is explained with inline comments that document why the trigger was chosen, which is important for future maintainers.

Summary

The new generated flow is substantially more secure and robust than what it replaces. The main open item is the cpflow exists output-parsing fragility, which the code already acknowledges. The PR is in good shape to merge once the draft status is resolved and tests can be run.

exists_output=""
if ! exists_output="$(cpflow exists -a "$APP_NAME" --org "$CPLN_ORG" 2>&1)"; then
case "$exists_output" in
*"Double check your org"*|*"Unknown API token format"*|*"ERROR"*|*"Error:"*|*"Traceback"*|*"Net::"*)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment here is well-placed. To make this easier to track, consider filing an issue against the cpflow gem linking to this file, so the workaround has a visible upstream ticket. The risk of silent misclassification (an unrecognized error string treated as "app not found") is the main hazard — a regression test that mocks cpflow exists returning an exit-1 with a novel error string would catch drift before it reaches production.

echo "Failed to post fork rejection comment to PR #${PR_NUMBER}" >&2
fi

echo "${message}" >&2
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the trigger is workflow_dispatch and the PR being deployed is from a fork (same_repo == "false"), neither the pull_request branch (line 143) nor the same_repo == "true" branch (line 138) matches, so execution falls through to this gh pr comment call followed by exit 1. That is technically correct — the deployment is blocked — but it reaches the gh pr comment path that was written for the issue_comment case, which may be surprising. A small guard makes the intent explicit:

Suggested change
echo "${message}" >&2
# workflow_dispatch with a fork PR: block the run without attempting a PR comment.
if [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then
echo "${message}" >&2
exit 1
fi
message="Review app deploys from fork pull requests are not allowed because this workflow uses repository secrets."
if ! gh pr comment "${PR_NUMBER}" --body "${message}"; then
echo "Failed to post fork rejection comment to PR #${PR_NUMBER}" >&2
fi
echo "${message}" >&2

run: |
set -euo pipefail

if [[ -z "$CPLN_TOKEN" ]]; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both token and org are declared required: true in inputs:, so Actions will already refuse to invoke this composite action without them. These null checks will never fire. They're harmless, but if left in they may mislead a future reader into thinking there's a code path where these are empty.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

PR #732 Review: cpflow Generated GitHub Actions Flow

Overview

This PR replaces hand-written Control Plane GitHub Actions with the output of cpflow generate-github-actions, streamlining CI/CD for review apps, staging, and production. It is a substantial improvement in security, correctness, and maintainability.


Security Improvements (Notable)

  • Token exposure fixed: The old setup-environment passed --token "$TOKEN" directly to cpln profile update default, exposing the token in /proc/<pid>/cmdline and ps output. The new action correctly passes it via CPLN_TOKEN env var.
  • SSH key isolation: cpflow-build-docker-image keeps the SSH private key in a dedicated step so it never appears in the build step's environment — mitigating ACTIONS_STEP_DEBUG exposure.
  • Fork PR protection hardened: The old deploy workflow allowed any comment containing /deploy-review-app (using contains()), with no author_association check. The new workflow requires an exact body match (==) and gates on OWNER, MEMBER, or COLLABORATOR.
  • Delete safety guard added: delete-app.sh validates the app name against REVIEW_APP_PREFIX before calling cpflow delete, preventing accidental deletion of non-review apps. The old script only checked for "production" or "staging" in the name.
  • Old nightly workflow bug fixed: The old nightly cleanup hardcoded the prefix qa-react-webpack-rails-tutorial instead of reading from vars.REVIEW_APP_PREFIX.

Correctness Improvements

  • Multi-workload rollback: Production promotion now captures and rolls back all app workloads (not just containers[0].image of the rails workload), making rollback safer for apps with sidecars or workers.
  • Env-var parity check before promotion: Compares GVC env var names between staging and production before promoting — prevents deploying with missing config.
  • Proper pull_request_target for review-app-help: The old help workflow used pull_request, which has no write access for fork PRs. The new version correctly uses pull_request_target without a code checkout.
  • cancel-in-progress: false on all deployment concurrency groups: Prevents partially-deployed states caused by mid-flight cancellations.
  • Staging branch filter fixed: The old staging workflow triggered on branches: ['*'] (every branch). The new workflow correctly gates on main/master with a configurable STAGING_APP_BRANCH.

Issues Found

Misleading comment: HEALTH_CHECK_RETRIES/INTERVAL are not actually overridable via repo vars

In cpflow-promote-staging-to-production.yml lines 15–20, the comment says:

"Override these by editing this file or by setting the matching repository variable."

But HEALTH_CHECK_RETRIES: 24 and HEALTH_CHECK_INTERVAL: 15 are hardcoded literals. Unlike HEALTH_CHECK_ACCEPTED_STATUSES (line 28) which reads ${{ vars.HEALTH_CHECK_ACCEPTED_STATUSES || '200 301 302' }}, the retries/interval values ignore any repo variable of the same name. Either change them to ${{ vars.HEALTH_CHECK_RETRIES || 24 }} / ${{ vars.HEALTH_CHECK_INTERVAL || 15 }}, or remove the repo-variable claim from the comment. As-is, anyone who sets HEALTH_CHECK_RETRIES as a repo variable will see no effect and waste time debugging.

Duplicated cpflow exists error-string detection

The fragile output-string matching for distinguishing "app not found" vs. API/auth errors is copy-pasted in:

  • cpflow-delete-control-plane-app/delete-app.sh (lines 34–44)
  • cpflow-deploy-review-app.yml (lines 204–213)

delete-app.sh already has a TODO acknowledging this. If the list of error tokens ever needs to change, both places need updating simultaneously. Consider extracting to a shared helper script (e.g., .github/actions/cpflow-delete-control-plane-app/cpflow-exists.sh) that both callers source.


Minor Notes

  • cpflow-deploy-staging.yml re-installs the Ruby toolchain and cpflow gem in both the build and deploy jobs (separate runners). Functionally correct since the Docker image is pushed to the registry between jobs, but you pay two gem-install times per staging deploy. Gem caching or a shared setup artifact could help if build times grow.
  • The hardcoded GitHub.com SSH host keys in cpflow-build-docker-image/action.yml are stable but will need updating if GitHub ever rotates them. The fallback to DOCKER_BUILD_SSH_KNOWN_HOSTS for non-GitHub hosts is a good escape hatch.
  • cpflow-setup-environment writes CPLN_TOKEN to $GITHUB_ENV, which is intentional (documented in the code), but it gives the token to every subsequent step in the job — broader scope than strictly necessary. GitHub's secret masking mitigates log exposure.
  • Version pins (@controlplane/cli@3.3.1, cpflow 4.2.0) are hardcoded rather than managed by Renovate/Dependabot. The new docs section on regeneration serves as a reminder to keep these current.

Verdict

The security and correctness improvements here are significant — particularly the token exposure fix, the fork PR hardening, and the multi-workload rollback. The one item worth addressing before merge is the misleading HEALTH_CHECK_RETRIES/INTERVAL comment, since it will mislead the next person trying to tune those values.

Comment on lines +15 to +20
# Override these by editing this file or by setting the matching repository variable.
# Worst-case wall time per attempt is HEALTH_CHECK_INTERVAL plus the curl --max-time below
# (10s), so the defaults give a ~10 minute window (24 × (15 + 10) = 600s) — enough for
# most Rails cold boots (asset precompile + db:migrate + workload readiness).
HEALTH_CHECK_RETRIES: 24
HEALTH_CHECK_INTERVAL: 15
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says these can be overridden "by setting the matching repository variable", but unlike HEALTH_CHECK_ACCEPTED_STATUSES (line 28 which reads ${{ vars.HEALTH_CHECK_ACCEPTED_STATUSES || '200 301 302' }}), these two are hardcoded literals — setting a repo variable with the same name has no effect.

Either wire them up to match the pattern used for HEALTH_CHECK_ACCEPTED_STATUSES:

Suggested change
# Override these by editing this file or by setting the matching repository variable.
# Worst-case wall time per attempt is HEALTH_CHECK_INTERVAL plus the curl --max-time below
# (10s), so the defaults give a ~10 minute window (24 × (15 + 10) = 600s) — enough for
# most Rails cold boots (asset precompile + db:migrate + workload readiness).
HEALTH_CHECK_RETRIES: 24
HEALTH_CHECK_INTERVAL: 15
HEALTH_CHECK_RETRIES: ${{ vars.HEALTH_CHECK_RETRIES || 24 }}
HEALTH_CHECK_INTERVAL: ${{ vars.HEALTH_CHECK_INTERVAL || 15 }}

…or remove the "setting the matching repository variable" claim from the comment above so the next person who tries to tune these doesn't spend time debugging why their repo var is ignored.

Comment on lines +30 to +35
# Until then, keep this list in sync if `cpflow exists` starts emitting new error patterns —
# any unmatched error string would otherwise be silently treated as "app not found".
exists_output=""
if ! exists_output="$(cpflow exists -a "$APP_NAME" --org "$CPLN_ORG" 2>&1)"; then
case "$exists_output" in
*"Double check your org"*|*"Unknown API token format"*|*"ERROR"*|*"Error:"*|*"Traceback"*|*"Net::"*)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same error-token list and detection logic is duplicated verbatim in cpflow-deploy-review-app.yml (the "Check if review app exists" step at line ~204). When this list needs to grow (e.g., a new cpflow exists error pattern), both files need to change in sync — which the TODO already flags.

A practical option: extract the detection to a small shared shell script (e.g., .github/actions/cpflow-delete-control-plane-app/cpflow-app-exists.sh) that returns 0/1/2 for exists/not-found/error, then source or call it from both places. That makes the error-token list a single source of truth and keeps the TODO resolution contained to one file.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Unnecessary CI work on every new PR open
    • Removed the opened pull_request trigger so the deploy workflow only runs on synchronize and reopened events while PR-open help remains handled by the help workflow.

Create PR

Or push these changes by commenting:

@cursor push 2c303a005e
Preview (2c303a005e)
diff --git a/.github/workflows/cpflow-deploy-review-app.yml b/.github/workflows/cpflow-deploy-review-app.yml
--- a/.github/workflows/cpflow-deploy-review-app.yml
+++ b/.github/workflows/cpflow-deploy-review-app.yml
@@ -4,7 +4,7 @@
 
 on:
   pull_request:
-    types: [opened, synchronize, reopened]
+    types: [synchronize, reopened]
   issue_comment:
     types: [created]
   workflow_dispatch:
@@ -35,7 +35,7 @@
 
 jobs:
   deploy:
-    # Skip synchronize/opened events from fork PRs at the job level — they cannot access
+    # Skip pull_request events from fork PRs at the job level — they cannot access
     # repository secrets anyway, so running any steps just burns billable minutes. Users
     # can still manually deploy a fork PR via `/deploy-review-app` (gated below by
     # author_association) or workflow_dispatch.

You can send follow-ups to the cloud agent here.


on:
pull_request:
types: [opened, synchronize, reopened]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary CI work on every new PR open

Low Severity

Adding opened to the pull_request types (the old workflow only had synchronize, reopened) causes the full workflow to run on every new PR — including Ruby setup, npm install of the CLI, and gem install — only to determine the review app doesn't exist yet and write a step summary. That summary is only visible in the Actions tab, while the separate cpflow-review-app-help.yml already posts a help comment directly on the PR. A brand-new PR can never have an existing review app, so opened adds several minutes of billable CI time per PR with no deployment benefit.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 9a390c1. Configure here.

@claude
Copy link
Copy Markdown

claude Bot commented May 1, 2026

Code Review

Overview

This PR replaces hand-written Control Plane GitHub Actions with the generated cpflow-* action/workflow set. It touches only CI/CD automation — no application code changes. The scope is large (+1787/-1147) but the change is well-structured and represents a clear improvement over the existing implementation across several dimensions.


Significant Improvements

Security

  • Old setup-environment action passed the CPLN token via --token on the command line, exposing it in /proc/<pid>/cmdline and ps output. The new cpflow-setup-environment action passes it via CPLN_TOKEN env var. Same fix applies in the old promote-staging-to-production.yml where -t ${{ secrets.CPLN_TOKEN_STAGING }} is now passed via CPLN_UPSTREAM_TOKEN env var.
  • The new cpflow-validate-config action uses bash indirect variable lookup (${!name:-}) so secret values are checked by name without ever being echo'd into logs.
  • cpflow-build-docker-image isolates the SSH key to a dedicated step so it is never present in the main build step's environment (preventing ACTIONS_STEP_DEBUG=true exposure).

Correctness — fork PR cleanup (critical fix)

  • The old delete-review-app.yml used pull_request trigger for PR close events. Fork PRs run pull_request in the fork's context and cannot access base-repo secrets, so fork review apps were never automatically deleted. The new cpflow-delete-review-app.yml correctly uses pull_request_target for the closed event, which runs with base-repo secrets.

Safety — delete guard

  • Old delete-app.sh blocked deletion by checking if the app name contains "production" or "staging". An app named qa-production-preview-123 would be incorrectly blocked. The new cpflow-delete-control-plane-app enforces a strict prefix match against REVIEW_APP_PREFIX, which is much tighter.

Nightly cleanup — hardcoded prefix bug fixed

  • The old nightly-remove-stale-review-apps.yml hardcoded qa-react-webpack-rails-tutorial as the cleanup prefix instead of using REVIEW_APP_PREFIX. The new workflow correctly reads from the repo variable.

Staging workflow scope

  • Old staging workflow used push: branches: ['*'] (all branches!), meaning every push anywhere triggered the deploy pipeline (which then gated on validate-branch). The new workflow scopes the push trigger to branches: ["master"].

Concurrency safety

  • Old staging deploy used cancel-in-progress: true, which could cancel a mid-flight deploy leaving staging in a partially-deployed state. New workflow uses cancel-in-progress: false throughout.

Release phase opt-in

  • Old workflows unconditionally passed --run-release-phase. The new cpflow-detect-release-phase action only adds the flag when a release_script: is configured, preventing failures on apps that don't have a release phase.

Production rollback

  • Old promotion workflow only captured the single rails workload image. New workflow captures all workloads and performs per-container-image rollback via cpln workload update --set, continuing through failures rather than aborting early.

Issues Found

Minor

  1. Fork issue_comment path exits 1 — in cpflow-deploy-review-app.yml, when a collaborator triggers /deploy-review-app on a fork PR, the workflow exits non-zero (red ✗ run) even though this is an expected, handled condition. The pull_request fork path correctly exits 0 with allowed=false; the issue_comment fork path should do the same (detailed inline comment below).

  2. Env-var parity check is GVC-scope onlycpflow-promote-staging-to-production.yml compares only GVC-level env var names. Workload-level or secret-store-backed variables won't be caught. This is an incremental improvement over the old code (which had the same scope) but worth documenting so operators know what the check covers.

  3. cpflow-review-app-help.yml posts on every PR open, including draft PRspull_request_target: types: [opened] fires for drafts. If teams regularly open drafts that never get a review app, the auto-comment will become noise. Consider adding if: github.event.pull_request.draft == false, or a note that teams may want to restrict this.

Non-blocking / suggestions

  • The cpflow-help.md and cpflow-review-app-help.yml both use pull_request_target. The help file table lists REVIEW_APP_PREFIX as required: yes with example review-app, but the docs/shakacode-team.md say it must be qa-react-webpack-rails-tutorial-pr. These are consistent but the help table example could be more specific to avoid confusion on setup.
  • The cpflow-cleanup-stale-review-apps.yml workflow passes REVIEW_APP_PREFIX to cpflow cleanup-stale-apps -a. Make sure the cpflow gem's cleanup-stale-apps accepts a prefix (rather than a full app name) as the -a argument — the old workflow passed qa-react-webpack-rails-tutorial (without the -pr suffix), so if the behavior of that command changed, this could silently skip cleanup.

Test Coverage

The PR description notes that rspec could not run locally due to missing PostgreSQL. The changes are CI/CD-only (no Ruby application code), so rspec coverage is not directly applicable. Functional validation via live GitHub Actions runs is the right path here, and the PR description covers the readiness checks that were run (github-flow-readiness, YAML parse, rubocop, diff check).


Summary

This is a well-executed migration with meaningful security, correctness, and safety improvements. The identified issues are minor; the fork issue_comment exit-code inconsistency is the only one worth addressing before merge. Everything else can be follow-up or is non-blocking.


echo "${message}" >&2
exit 1

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fork PR rejection for issue_comment exits 1 (job fails, red ✗), but the equivalent pull_request path exits 0 with allowed=false (job succeeds, subsequent steps are skipped). A collaborator triggering /deploy-review-app on a fork PR will see a failed workflow run, which is noisy and confusing when the PR comment already explains why it was blocked.

Consider aligning the two paths:

Suggested change
echo "allowed=false" >> "$GITHUB_OUTPUT"
message="Review app deploys from fork pull requests are not allowed because this workflow uses repository secrets."
if [[ "${{ github.event_name }}" == "issue_comment" ]]; then
if ! gh pr comment "${PR_NUMBER}" --body "${message}"; then
echo "Failed to post fork rejection comment to PR #${PR_NUMBER}" >&2
fi
fi
{
echo "${message}"
} >> "$GITHUB_STEP_SUMMARY"

Then all subsequent steps already gate on steps.source.outputs.allowed == 'true', so no other changes are needed.

run: |
set -euo pipefail

staging_vars="$(CPLN_TOKEN="${CPLN_TOKEN_STAGING}" cpln gvc get "${STAGING_APP_NAME}" --org "${CPLN_ORG_STAGING}" -o json | jq -r '.spec.env // [] | .[].name' | sort)"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check compares GVC-level environment variable names only. Variables set at the workload level (inside a workload spec's env: block rather than the GVC's env: block) or backed by Control Plane secret stores will not be caught by this check.

This is an improvement over the old code (same scope), but worth documenting so operators understand the guarantee: the check ensures no GVC-level staging var is absent from production at the GVC level — it does not cover workload-scoped vars or secret references.

A comment on the if [[ -z "${staging_vars}" ]] branch might also help — if staging uses only workload-level vars, the check silently skips entirely rather than flagging a misconfiguration.

# Do not switch this to `pull_request` or add a checkout step without re-evaluating.
pull_request_target:
types: [opened]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fires on every PR open, including draft PRs. Teams that routinely open drafts before they're ready for review will see an auto-comment on every draft, which can become noise.

Consider restricting to non-draft PRs:

Suggested change
pull_request_target:
types: [opened]
# Remove the filter below if you want the help comment on draft PRs too

…and adding a job-level guard:

    if: github.event.pull_request.draft == false

Or document this behavior in cpflow-help.md if the team intentionally wants drafts included.

@claude
Copy link
Copy Markdown

claude Bot commented May 1, 2026

Code Review: cpflow GitHub Actions Migration

This PR replaces the hand-written Control Plane GitHub Actions with the generated cpflow-* composite action set. The overall direction is sound - the generated actions are more robust, better commented, and fix several real issues in the old code.

Strengths: Token passed via CPLN_TOKEN env var (not --token arg, prevents process-list exposure). SSH key isolated to dedicated step with umask 077 and chmod 600. Heredoc delimiter randomization via openssl rand. Fork PR safety: deploy uses pull_request; delete help use pull_request_target. Deletion guard enforces REVIEW_APP_PREFIX allowlist. Production rollback covers all workloads.

Issues:

  1. cpflow-deploy-review-app.yml - Initial checkout runs before fork guard. Safe under pull_request (base-repo context), but a comment explaining this would prevent future misrefactoring.
  2. cpflow-deploy-staging.yml - validate-branch job checks out the repo to call cpflow-validate-config, but that action only does indirect bash variable lookups. Dropping the checkout saves 15s per staging deploy.
  3. cpflow-promote-staging-to-production.yml - permissions: contents: write is only needed for gh release create. Annotate why write is required.
  4. cpflow-validate-config/action.yml - Entry trim strips exactly one leading/trailing space; two spaces would silently fail the lookup.
  5. cpflow-help-command.yml - Workflow starts on every issue comment; if: guard is correct but adds noise to Actions history.

Regressions fixed: delete workflow pull_request -> pull_request_target (fork PR secrets access); deploy command substring match -> exact match; staging cancel-in-progress: true -> false; --token arg -> CPLN_TOKEN env; name-contains delete guard -> REVIEW_APP_PREFIX allowlist.

Verdict: Ready to merge with minor notes. Clear improvement over old workflows across security, fork PR safety, and production rollback. Draft status appropriate while repo vars/secrets are verified.

- name: Validate required secrets and variables
id: config
uses: ./.github/actions/cpflow-validate-config
env:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Initial checkout step runs unconditionally before the fork origin guard at the Validate review app deployment source step. Under pull_request this is safe — GitHub always runs the workflow in the base repository's context, so even fork PRs check out base-repo code, not fork code. A comment here would make this safe ordering explicit and prevent a future maintainer from accidentally reversing the steps:

Suggested change
env:
# NOTE: pull_request events always execute in the base-repo context, so
# checking out here before the fork guard is intentional and safe.
- name: Initial checkout


build:
needs: validate-branch
if: needs.validate-branch.outputs.is_deployable == 'true'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checkout in validate-branch is only needed because cpflow-validate-config is a local composite action (./.github/actions/...). However, cpflow-validate-config only performs indirect bash variable lookups — it never reads files from the working tree. The checkout could be dropped from this job and the validate config step could be moved to the build job (which already checks out), saving ~15 seconds per staging push.

If keeping the checkout here for simplicity, this comment would clarify intent:

Suggested change
if: needs.validate-branch.outputs.is_deployable == 'true'
# Checkout is required to reference the local ./.github/actions/cpflow-validate-config action.
- name: Checkout repository


on:
workflow_dispatch:
inputs:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contents: write permission is only required by the gh release create step near the end of the job. All other steps need at most contents: read. Since this is a workflow_dispatch-only workflow the blast radius is low, but a comment would help auditors understand why write is needed here:

Suggested change
inputs:
permissions:
contents: write # Required for `gh release create` in the promotion summary step

CPFLOW_PR_FRIENDLY: ${{ inputs.pull_request_friendly }}
CPFLOW_EVENT_NAME: ${{ github.event_name }}
run: |
set -euo pipefail
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These parameter expansion trims strip exactly one leading/trailing space. An entry with two or more leading spaces (e.g. " secret:FOO") would not be fully trimmed and the subsequent ${!name} indirect lookup would silently fail (looking up a variable named " secret:FOO" instead of "FOO").

Since the required: input is author-controlled YAML the practical risk is low, but a more robust trim would be:

Suggested change
set -euo pipefail
entry="${entry#"${entry%%[! ]*}"}"
entry="${entry%"${entry##*[! ]}"}"

Or with extglob: entry="${entry##*([[:space:]])}"; entry="${entry%%*([[:space:]])}" after shopt -s extglob.

@claude
Copy link
Copy Markdown

claude Bot commented May 1, 2026

Code Review: Replace hand-written GitHub Actions with generated cpflow-* flow

Overview

This PR replaces all the hand-written Control Plane CI/CD actions and workflows (+1790 / -1147 lines) with a generated cpflow-* action/workflow set. The new code is meaningfully more correct, more secure, and better structured than what it replaces. The overall direction is clearly right.


Security Fixes (significant wins)

Old New
cpln profile update default … --token "$TOKEN" — token visible in ps / /proc Token passed via CPLN_TOKEN env var — never on argv
cpflow copy-image-from-upstream … -t ${{ secrets.CPLN_TOKEN_STAGING }} — same leak CPLN_UPSTREAM_TOKEN env var
${{ inputs.commit }} interpolated directly into shell All values passed through env: before shell use
/delete-review-app had no author_association check — any commenter could delete Now gated to OWNER, MEMBER, or COLLABORATOR
Old delete-app.sh allowed deleting any app unless its name contained the words production or staging New delete-app.sh rejects anything that doesn't start with $REVIEW_APP_PREFIX-

Other Improvements

  • cancel-in-progress: false on all deploy/delete/promote workflows — a cancelled mid-rollout cpflow deploy-image left apps in partially-deployed state before.
  • Auto-detect release phase--run-release-phase was previously always passed; now only added when release_script: is actually configured.
  • Multi-workload rollback in promote-staging-to-production — old code only rolled back the rails workload image. New code captures and restores every workload.
  • Fork PR handling — old deploy workflow ran on every fork PR push and silently failed when secrets were unavailable. New workflow skips at the job level and documents why.
  • Stale cleanup --org arg — old nightly-remove-stale-review-apps ran cpflow cleanup-stale-apps without --org. New workflow passes it explicitly.

Issues / Observations

1. detect-release-phase runs before the app existence check in cpflow-deploy-review-app.yml

The Detect release phase support step at line 163 runs before Check if review app exists at line 170. On a first-ever /deploy-review-app trigger the app doesn't exist yet, so cpflow config exits non-zero. With set -euo pipefail inside an if, the failure is swallowed and flag= is emitted (safe fallback), but stderr is still printed and may confuse operators reading the log. Consider moving this step after setup-review-app, or gating it on steps.check-app.outputs.exists == 'true'.

2. Fragile error-pattern matching in delete-app.sh

The case statement at line 34 distinguishes auth/API errors from app-not-found by matching substrings of cpflow exists output. This is already called out as a TODO, but deserves a tracking issue: if cpflow changes its error messages, misconfigured-token failures would silently be treated as "app does not exist" and deletion would be skipped with a success exit code.

3. Verbose step-level conditions in cpflow-deploy-review-app.yml

The guard steps.config.outputs.ready == 'true' && steps.source.outputs.allowed == 'true' && (steps.check-app.outputs.exists == 'true' || steps.setup-review-app.outcome == 'success') is copy-pasted across ~10 steps. A newly added step that accidentally omits the if: line will execute unconditionally. Splitting into two jobs (check/setup and build/deploy) and using needs: gating would make this structural.

4. cpflow-deploy-staging.yml reinstalls Ruby + CLI in both build and deploy jobs

Each job calls cpflow-setup-environment, which runs npm install -g and gem install. Adding a tool cache would halve install time on every staging deploy without any correctness risk.


Minor / Nits

  • cpflow-review-app-help.yml correctly uses pull_request_target with no checkout step — the inline comment explains the reasoning well.
  • The randomized token_delimiter in cpflow-setup-environment (CPLN_TOKEN_$(openssl rand -hex 8)) is a correct defense against heredoc-terminator injection and worth keeping.
  • cpflow-help.md calls out that changing STAGING_APP_BRANCH also requires updating the branches: filter in cpflow-deploy-staging.yml — good documentation.

Overall: The security fixes alone make this worth merging. The issues noted are non-blocking: observability noise on first deploy (issue 1), a known TODO that should be tracked as an issue (issue 2), and two style/performance suggestions.

cpln_cli_version: ${{ vars.CPLN_CLI_VERSION }}
cpflow_version: ${{ vars.CPFLOW_VERSION }}

- name: Detect release phase support
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering issue: release phase detection runs before app existence is known

cpflow config -a "${APP_NAME}" will fail (and print to stderr) if the app doesn't yet exist — which is always the case on a first /deploy-review-app. Because the command is inside an if, set -e doesn't abort, and flag= is safely emitted as the fallback. But the error output appears in the log and can mislead operators.

Consider moving this step after setup-review-app (or gating it with steps.check-app.outputs.exists == 'true'), so it only runs against an app that's confirmed to exist:

Suggested change
- name: Detect release phase support
- name: Detect release phase support
if: steps.config.outputs.ready == 'true' && steps.source.outputs.allowed == 'true' && (steps.check-app.outputs.exists == 'true' || steps.setup-review-app.outcome == 'success')
id: release-phase
uses: ./.github/actions/cpflow-detect-release-phase
with:
app_name: ${{ env.APP_NAME }}

exists_output=""
if ! exists_output="$(cpflow exists -a "$APP_NAME" --org "$CPLN_ORG" 2>&1)"; then
case "$exists_output" in
*"Double check your org"*|*"Unknown API token format"*|*"ERROR"*|*"Error:"*|*"Traceback"*|*"Net::"*)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fragile string-matching (acknowledged TODO — needs a tracking issue)

This case pattern distinguishes real API/auth errors from app not found by matching substrings of cpflow exists output. If cpflow ever changes or adds error messages, a real failure would silently fall through as "app does not exist" and exit 0 — meaning a delete triggered on an app whose token is invalid would do nothing and report success.

The TODO comment documents this well, but a GitHub issue to track the upstream cpflow exists --json or structured exit-code request would make sure it doesn't get forgotten:

Suggested change
*"Double check your org"*|*"Unknown API token format"*|*"ERROR"*|*"Error:"*|*"Traceback"*|*"Net::"*)
*"Double check your org"*|*"Unknown API token format"*|*"ERROR"*|*"Error:"*|*"Traceback"*|*"Net::"*)

Suggestion: open a cpflow upstream issue requesting a distinct exit code for "not found" vs "auth/API error" and link it here.

});

- name: Build Docker image
if: steps.config.outputs.ready == 'true' && steps.source.outputs.allowed == 'true' && (steps.check-app.outputs.exists == 'true' || steps.setup-review-app.outcome == 'success')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeated gate condition is copy-paste fragile

This if: guard appears on ~10 consecutive steps:

steps.config.outputs.ready == 'true' && steps.source.outputs.allowed == 'true' && (steps.check-app.outputs.exists == 'true' || steps.setup-review-app.outcome == 'success')

A newly added step that accidentally omits it will run unconditionally — silently posting a PR comment or creating a GitHub deployment even when config isn't ready.

A structural fix would be to split this into two jobs:

  • preflight — validates config, resolves PR ref, checks app existence, creates app if needed
  • build-and-deploy — gated via needs: preflight with a job-level if: needs.preflight.outputs.ready == 'true'

This pushes the gate up to needs: so individual step omissions can't escape it.


# Later workflow steps call cpflow/cpln, so persist the token without putting it on argv.
# This intentionally gives subsequent trusted steps in the same job a CPLN_TOKEN env var.
token_delimiter="CPLN_TOKEN_$(openssl rand -hex 8)"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct pattern — worth a note for future maintainers

The randomized delimiter (CPLN_TOKEN_$(openssl rand -hex 8)) is the right way to write a multiline secret to $GITHUB_ENV. A fixed delimiter like EOF would be vulnerable to heredoc injection if the token value happened to contain that string on a line by itself (unlikely for a base64-encoded token, but still the right habit).

One thing to be aware of: once this step completes, CPLN_TOKEN is in the process environment for all subsequent steps in the same job. This is intentional (documented in the comment above), but if ACTIONS_STEP_DEBUG=true is ever enabled on a run, GitHub will dump each step's env before commands execute. GitHub's secret masking should redact the token value in those dumps since it was originally sourced from ${{ secrets.* }}, but it's worth confirming this in your GitHub org's audit settings if you operate in a high-security context.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 4 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Health check cpln failure aborts retry loop prematurely
    • Guarded the workload endpoint lookup so transient cpln or jq failures clear the endpoint and allow the polling loop to retry.
  • ✅ Fixed: Error message header repeats for each missing item
    • Split the error output into one header line followed by a separate printf for all missing items.

Create PR

Or push these changes by commenting:

@cursor push f419c0c97c
Preview (f419c0c97c)
diff --git a/.github/actions/cpflow-validate-config/action.yml b/.github/actions/cpflow-validate-config/action.yml
--- a/.github/actions/cpflow-validate-config/action.yml
+++ b/.github/actions/cpflow-validate-config/action.yml
@@ -71,5 +71,6 @@
           exit 0
         fi
 
-        printf 'Missing required GitHub configuration:\n- %s\n' "${missing[@]}" >&2
+        echo "Missing required GitHub configuration:" >&2
+        printf -- '- %s\n' "${missing[@]}" >&2
         exit 1

diff --git a/.github/actions/cpflow-wait-for-health/action.yml b/.github/actions/cpflow-wait-for-health/action.yml
--- a/.github/actions/cpflow-wait-for-health/action.yml
+++ b/.github/actions/cpflow-wait-for-health/action.yml
@@ -61,7 +61,10 @@
         for attempt in $(seq 1 "${CPFLOW_MAX_RETRIES}"); do
           echo "Health check attempt ${attempt}/${CPFLOW_MAX_RETRIES}"
 
-          endpoint="$(cpln workload get "${CPFLOW_WORKLOAD_NAME}" --gvc "${CPFLOW_APP_NAME}" --org "${CPFLOW_ORG}" -o json | jq -r '.status.endpoint // empty')"
+          if ! endpoint="$(cpln workload get "${CPFLOW_WORKLOAD_NAME}" --gvc "${CPFLOW_APP_NAME}" --org "${CPFLOW_ORG}" -o json | jq -r '.status.endpoint // empty')"; then
+            echo "Unable to fetch workload endpoint; will retry if attempts remain."
+            endpoint=""
+          fi
           if [[ -n "${endpoint}" ]]; then
             http_status="$(curl -s -o /dev/null -w '%{http_code}' --max-time "${CPFLOW_CURL_MAX_TIME}" "${endpoint}" 2>/dev/null || echo 000)"
             echo "Endpoint: ${endpoint}, HTTP status: ${http_status}"

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit fee6ce9. Configure here.

for attempt in $(seq 1 "${CPFLOW_MAX_RETRIES}"); do
echo "Health check attempt ${attempt}/${CPFLOW_MAX_RETRIES}"

endpoint="$(cpln workload get "${CPFLOW_WORKLOAD_NAME}" --gvc "${CPFLOW_APP_NAME}" --org "${CPFLOW_ORG}" -o json | jq -r '.status.endpoint // empty')"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Health check cpln failure aborts retry loop prematurely

Medium Severity

The cpln workload get call on line 64 runs under set -euo pipefail with no error guard. A transient API or network failure causes the entire script to exit immediately instead of retrying on the next loop iteration. The curl call on line 66 is correctly guarded with || echo 000, showing retry resilience was intended, but the cpln call lacks similar protection. In the production promotion workflow, this health check failure triggers a rollback of a potentially healthy deployment.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit fee6ce9. Configure here.

exit 0
fi

printf 'Missing required GitHub configuration:\n- %s\n' "${missing[@]}" >&2
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message header repeats for each missing item

Low Severity

printf 'Missing required GitHub configuration:\n- %s\n' "${missing[@]}" repeats the entire format string for every element in the missing array. When multiple secrets or variables are missing, the "Missing required GitHub configuration:" header is printed once per item, producing confusing duplicate output instead of a single header followed by a list.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit fee6ce9. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant