Skip to content

feat(gpu): honor device IDs in Docker and Podman#1253

Open
elezar wants to merge 4 commits intomainfrom
feat/docker-podman-gpu-device-id
Open

feat(gpu): honor device IDs in Docker and Podman#1253
elezar wants to merge 4 commits intomainfrom
feat/docker-podman-gpu-device-id

Conversation

@elezar
Copy link
Copy Markdown
Member

@elezar elezar commented May 7, 2026

Summary

Honor the existing GPU device ID field for Docker and Podman GPU sandboxes without changing the protobuf API shape.

Related Issue

None.

Changes

  • Added a shared helper that maps the existing GPU fields to CDI device IDs.
  • Updated Docker device requests to pass explicit GPU device IDs through and keep the default all-GPU CDI request.
  • Updated Podman container devices with the same explicit GPU device ID handling.
  • Added Rust e2e coverage for Docker and Podman GPU device selection, including default GPU requests, per-index CDI IDs, nvidia.com/gpu=all, and invalid device IDs.
  • For per-index CDI selection, the e2e test now verifies the selected physical GPU by UUID because containers may renumber a single visible GPU to ordinal 0.
  • Switched the GPU CI workflow from the Python/k3s-oriented GPU suite to the Docker GPU e2e task.
  • Repointed e2e:gpu to Docker GPU coverage and kept the previous Python GPU task available as e2e:k3s:gpu.
  • Updated Docker and Podman driver documentation notes.
  • Fixed the Helm README markdown lint issue that was blocking pre-commit.

Testing

  • cargo test --manifest-path e2e/rust/Cargo.toml --features e2e-docker-gpu --test gpu_device_selection --no-run
  • cargo test --manifest-path e2e/rust/Cargo.toml --features e2e-podman-gpu --test gpu_device_selection --no-run
  • cargo check -p openshell-cli
  • git diff --check
  • .github/workflows/e2e-gpu-test.yaml YAML parse check
  • mise tasks lists e2e:gpu, e2e:k3s:gpu, e2e:docker:gpu, and e2e:python:gpu
  • mise run e2e:docker:gpu validated on a dGPU-based system
  • mise run markdown:lint:md
  • mise run pre-commit
  • mise run e2e:podman:gpu still needs to be run

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

@elezar elezar requested review from a team, derekwaynecarr, maxamillion and mrunalp as code owners May 7, 2026 21:24
@elezar elezar force-pushed the feat/docker-podman-gpu-device-id branch from dc3cae4 to 9735e15 Compare May 7, 2026 21:53
@elezar elezar added the test:e2e-gpu Requires GPU end-to-end coverage label May 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Label test:e2e-gpu applied for 293b1f0. Open the existing run and click Re-run all jobs to execute with the label set. The E2E Gate check on this PR will flip green automatically once the run finishes.

@elezar elezar force-pushed the feat/docker-podman-gpu-device-id branch from 5af3636 to 3517ac2 Compare May 7, 2026 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e-gpu Requires GPU end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant