Skip to content

fix(release): pin macOS gateway supervisor image tag#1260

Open
TaylorMutch wants to merge 1 commit intomainfrom
fix/macos-gateway-supervisor-image-tag
Open

fix(release): pin macOS gateway supervisor image tag#1260
TaylorMutch wants to merge 1 commit intomainfrom
fix/macos-gateway-supervisor-image-tag

Conversation

@TaylorMutch
Copy link
Copy Markdown
Collaborator

Summary

  • Bake OPENSHELL_IMAGE_TAG into the macOS standalone gateway binary so the supervisor image lookup uses a valid Docker reference instead of falling through to a CARGO_PKG_VERSION that contains + build metadata.
  • Patches both release-dev.yml and release-tag.yml plus Dockerfile.gateway-macos. PR fix(docker): use supervisor image path directly #1259 covered the Linux gateway binary; this is the macOS counterpart.

Diagnostics

A user-reported failure of install-dev.sh on Apple Silicon: the gateway service shows as "started" via brew services list, but the gateway is actually crash-looping.

$ launchctl print gui/$(id -u)/homebrew.mxcl.openshell | grep 'last exit\|runs'
        runs = 43
        last exit code = 1

The launchd error log:

$ tail /opt/homebrew/var/log/openshell/openshell-gateway.err.log
Error:   × execution error: failed to create compute runtime: configuration error:
  │ failed to inspect docker supervisor image 'ghcr.io/nvidia/openshell/
  │ supervisor:0.0.37-dev.147+g084c93b6a': Docker responded with status code
  │ 400: invalid reference format

The + is illegal in Docker image references (which only accept [A-Za-z0-9_.-] after the colon). The +g… is SemVer build metadata that should never reach an image tag.

Root cause

default_docker_supervisor_image_tag() at crates/openshell-driver-docker/src/lib.rs:92 resolves the supervisor tag at compile time in this order:

  1. option_env!("OPENSHELL_IMAGE_TAG")
  2. option_env!("IMAGE_TAG")
  3. env!("CARGO_PKG_VERSION")

The dev release pipeline patches the workspace Cargo version to a git-derived string like 0.0.37-dev.147+g084c93b6a. When OPENSHELL_IMAGE_TAG is not set at build time, the binary falls all the way through to that patched version and bakes the + into the default supervisor reference.

In .github/workflows/release-dev.yml:

  • build-cli-macos (line 358–365): passes both OPENSHELL_CARGO_VERSION and OPENSHELL_IMAGE_TAG=dev
  • build-gateway-binary-linux (line 432): set via env after PR fix(docker): use supervisor image path directly #1259
  • build-gateway-binary-macos (line 504–513): missing OPENSHELL_IMAGE_TAG

release-tag.yml had the same gap for tagged releases on macOS.

Dockerfile.gateway-macos also did not declare ARG OPENSHELL_IMAGE_TAG, so even with --build-arg the value was never visible to cargo build. The companion Dockerfile.cli-macos declares both ARGs near the final cargo step (after dependency layers are cached) — this PR mirrors that pattern.

Changes

  • release-dev.yml: pass --build-arg OPENSHELL_IMAGE_TAG=${{ github.sha }} to the macOS gateway buildx invocation, matching the Linux fix in fix(docker): use supervisor image path directly #1259.
  • release-tag.yml: pass --build-arg OPENSHELL_IMAGE_TAG=${{ needs.compute-versions.outputs.source_sha }}, matching the Linux fix in fix(docker): use supervisor image path directly #1259.
  • Dockerfile.gateway-macos: declare ARG OPENSHELL_IMAGE_TAG next to ARG OPENSHELL_CARGO_VERSION, after the dependency-build cache layers (matches the cli-macos Dockerfile, including the comment explaining the placement).

The supervisor image is published with <github.sha> / <source_sha> tags by build-supervisor -> docker-build.yml and re-tagged as dev / latest / <semver> by tag-ghcr-{dev,release}. Pinning the gateway binary to the SHA is consistent with the Linux fix and points to a tag that is guaranteed to exist before the release notes are published.

Test plan

  • mise run pre-commit passes locally (done before push).
  • CI on this branch builds the macOS gateway binary without errors.
  • After merge + dev re-release, run install-dev.sh on an Apple Silicon host:
    • brew services info openshell shows Running: true
    • curl http://127.0.0.1:17670/health returns 2xx
    • /opt/homebrew/var/log/openshell/openshell-gateway.err.log does not contain invalid reference format
  • openshell sandbox create succeeds against the local gateway, confirming the supervisor image is pulled successfully.

Workaround for users on the broken dev build

Until a new dev release is cut, set the runtime override (OPENSHELL_DOCKER_SUPERVISOR_IMAGE) in the launchd plist:

plutil -insert EnvironmentVariables.OPENSHELL_DOCKER_SUPERVISOR_IMAGE \
  -string "ghcr.io/nvidia/openshell/supervisor:dev" \
  ~/Library/LaunchAgents/homebrew.mxcl.openshell.plist
brew services restart openshell

Note: brew services regenerates the plist from the formula, so this needs to be re-applied after each install-dev.sh run.

The macOS standalone gateway binary was built without
OPENSHELL_IMAGE_TAG, so default_docker_supervisor_image_tag()
fell through to CARGO_PKG_VERSION. The dev release pipeline
patches that to e.g. 0.0.37-dev.147+g084c93b6a, leaving a
'+' in the supervisor image tag which Docker rejects with
'invalid reference format', causing the gateway to crash-loop
on Apple Silicon dev installs.

PR #1259 fixed this for the Linux gateway binary but the macOS
build path (which goes through deploy/docker/Dockerfile.gateway-macos
under osxcross) was not covered.

- release-dev.yml: pass OPENSHELL_IMAGE_TAG=<github.sha> to the
  macOS gateway docker build, matching the Linux fix.
- release-tag.yml: pass OPENSHELL_IMAGE_TAG=<source_sha> to the
  macOS gateway docker build, matching the Linux fix.
- Dockerfile.gateway-macos: declare ARG OPENSHELL_IMAGE_TAG so
  the build arg actually reaches cargo (matches the cli-macos
  Dockerfile pattern, including the comment about ARG placement
  to avoid invalidating dependency-build cache layers).
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 8, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

@drew drew marked this pull request as ready for review May 8, 2026 02:20
@drew drew requested review from a team, derekwaynecarr, maxamillion and mrunalp as code owners May 8, 2026 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants