From a74398c59f4f52a656b95675cd259c839c8d8230 Mon Sep 17 00:00:00 2001 From: Drew Newberry Date: Thu, 7 May 2026 11:33:38 -0700 Subject: [PATCH 1/3] docs(podman): restore driver architecture details Signed-off-by: Drew Newberry --- crates/openshell-driver-podman/README.md | 360 ++++++++++++++++++--- docs/reference/sandbox-compute-drivers.mdx | 8 + 2 files changed, 327 insertions(+), 41 deletions(-) diff --git a/crates/openshell-driver-podman/README.md b/crates/openshell-driver-podman/README.md index 1193416e1..f5c856928 100644 --- a/crates/openshell-driver-podman/README.md +++ b/crates/openshell-driver-podman/README.md @@ -3,10 +3,27 @@ Podman-backed compute driver for rootless and single-machine OpenShell deployments. -The driver talks to the Podman libpod REST API over a Unix socket. It runs -in-process with the gateway server and creates one sandbox container per -sandbox. The `openshell-sandbox` supervisor inside the container still owns the -actual agent isolation. +The driver talks to the Podman libpod REST API over a Unix socket. The gateway +usually constructs it in-process, while the crate also ships an +`openshell-driver-podman` binary that exposes the shared compute-driver gRPC +surface for standalone use and tests. Each sandbox is one Podman container, and +the `openshell-sandbox` supervisor inside that container owns the actual agent +isolation. + +## Source Map + +All paths are relative to `crates/openshell-driver-podman/src/`. + +| File | Purpose | +|---|---| +| `lib.rs` | Crate root and public re-exports. | +| `main.rs` | Standalone driver binary, CLI/env parsing, and gRPC server startup. | +| `driver.rs` | Sandbox lifecycle, image pulls, network setup, endpoint detection, GPU checks, and rootless preflight checks. | +| `client.rs` | Async HTTP/1.1 client for Podman libpod APIs over a Unix socket. | +| `container.rs` | Podman container spec construction, environment ownership, labels, resources, capabilities, mounts, health checks, port mappings, secrets, and CDI devices. | +| `config.rs` | `PodmanComputeConfig`, image pull policy parsing, default socket paths, TLS validation, and redacted debug output. | +| `grpc.rs` | Tonic service adapter from the compute-driver protobuf API to the Rust driver methods. | +| `watcher.rs` | Initial state sync and live Podman event stream mapping into gateway watch events. | ## Runtime Model @@ -15,60 +32,321 @@ flowchart LR GW["Gateway"] -->|"in-process driver"| D["PodmanComputeDriver"] D -->|"HTTP over Unix socket"| P["Podman API"] P --> C["Sandbox container"] - C --> S["openshell-sandbox supervisor"] - S --> A["restricted agent child"] + C -->|"entrypoint"| S["openshell-sandbox supervisor"] + S -->|"nested netns + policy proxy"| A["restricted agent child"] + S -.->|"supervisor relay"| GW ``` -The container is the runtime boundary. Inside it, the supervisor creates a -nested network namespace, starts the policy proxy, applies Landlock/seccomp, and -launches the agent child as an unprivileged user. +The container is the outer runtime boundary. Inside it, the supervisor creates a +nested network namespace, starts the CONNECT policy proxy, applies +Landlock/seccomp controls, opens the supervisor relay back to the gateway, and +launches agent commands as the unprivileged sandbox user. + +The driver configures container runtime details only. It does not enforce +OpenShell filesystem, process, network, inference, or credential policy itself. +Those controls stay in `openshell-sandbox` so Podman, Docker, Kubernetes, and VM +runtimes share the same sandbox contract. + +## Driver Comparison + +| Aspect | Kubernetes | Docker | VM | Podman | +|---|---|---|---|---| +| Driver shape | In-process | In-process | Gateway-spawned subprocess | In-process, with standalone binary support | +| Backend | Kubernetes API | Docker daemon | libkrun and gvproxy | Podman libpod REST API over UDS | +| Outer boundary | Pod | Container | MicroVM | Container | +| Supervisor delivery | Supervisor image or init copy into pod volume | Extracted or mounted supervisor binary | Embedded guest bundle | Read-only OCI image volume | +| Callback path | Pod to gateway service or endpoint | Host networking | gvproxy host-loopback NAT | `host.containers.internal` or explicit endpoint | +| SSH transport | Supervisor relay | Supervisor relay | Supervisor relay | Supervisor relay | +| GPU support | `nvidia.com/gpu` resource | CDI when daemon supports it | Experimental VFIO path | CDI device request when NVIDIA devices exist | +| State owner | Kubernetes API | Docker daemon | Driver state dir | Podman daemon | + +## Startup Checks + +`PodmanComputeDriver::new` validates the host before accepting sandbox work: + +- Verifies the configured Podman socket path exists, then pings `/_ping`. +- Fetches `/libpod/info` and rejects cgroups v1 because rootless Podman needs + cgroups v2. +- Logs the Podman network backend and whether Podman reports rootless mode. +- Warns when the current user appears to lack `/etc/subuid` or `/etc/subgid` + ranges. This is not a hard failure because some systems provide subordinate + IDs through directory services. +- Creates or reuses the configured bridge network with DNS enabled. +- Auto-detects the sandbox callback endpoint when `OPENSHELL_GRPC_ENDPOINT` is + unset. + +The default socket path is `$XDG_RUNTIME_DIR/podman/podman.sock` on Linux, with +`/run/user//podman/podman.sock` as the fallback. On macOS it is +`$HOME/.local/share/containers/podman/machine/podman.sock`. ## Supervisor Delivery -Podman uses an OCI image volume to mount the supervisor binary read-only at -`/opt/openshell/bin`. The supervisor image is built from the `supervisor` target -in `deploy/docker/Dockerfile.images`. +Podman uses an OCI image volume to mount the supervisor image read-only at +`/opt/openshell/bin`. The supervisor image target in +`deploy/docker/Dockerfile.images` copies the `openshell-sandbox` binary to +`/openshell-sandbox`; mounting that image at `/opt/openshell/bin` makes the +binary available as `/opt/openshell/bin/openshell-sandbox`. + +The container spec sets that binary as the entrypoint. This avoids relying on +the sandbox image entrypoint or command, which might otherwise append the +supervisor path as an argument to an image-provided shell. + +This model keeps the supervisor outside the mutable sandbox image without using +a hostPath-style bind mount. + +## Container Contract + +The generated libpod create spec sets security-critical fields directly and +lets driver-owned values override template values. + +| Setting | Value | Purpose | +|---|---|---| +| `user` | `0:0` | The supervisor starts as root inside the container so it can create namespaces, configure mounts, and install sandbox controls. | +| `entrypoint` | `/opt/openshell/bin/openshell-sandbox` | Runs the supervisor directly regardless of the sandbox image entrypoint. | +| `volumes` | Named volume mounted at `/sandbox` | Provides the sandbox workspace. | +| `image_volumes` | Supervisor image mounted read-only at `/opt/openshell/bin` | Sideloads the supervisor binary. | +| `netns` | `bridge` | Attaches the container to the configured Podman bridge network. | +| `portmappings` | Container SSH port to host port `0` | Requests an ephemeral host port for compatibility and health/debug paths. | +| `hostadd` | `host.containers.internal` and `host.openshell.internal` to `host-gateway` | Gives containers stable names for services on the gateway host. | +| `mounts` | Private tmpfs at `/run/netns` | Lets the supervisor create named network namespaces under rootless Podman. | +| `no_new_privileges` | `true` | Prevents privilege escalation through exec. | +| `seccomp_profile_path` | `unconfined` | Avoids Podman's container-level profile blocking Landlock/seccomp setup before the supervisor installs its own policy-aware filter. | -This keeps the supervisor outside the mutable sandbox image while avoiding a -hostPath-style bind mount. +The agent child loses the supervisor's privileges before user code runs. -## Rootless Adaptations +## Capabilities -Rootless Podman has stricter capability behavior than Kubernetes. The container -spec drops all capabilities and adds back only the supervisor capabilities it -needs: +Podman's default container capability set is restricted. The driver drops +capabilities the supervisor does not need and adds the extra ones required for +OpenShell isolation. -- `SYS_ADMIN` for namespace and Landlock setup. -- `NET_ADMIN` for nested network namespace routing. -- `SYS_PTRACE` and `DAC_READ_SEARCH` for process identity inspection. -- `SYSLOG` for bypass diagnostics. -- `SETUID` and `SETGID` for dropping to the sandbox user. +| Capability | Purpose | +|---|---| +| `SYS_ADMIN` | Namespace creation, Landlock setup, and seccomp filter installation. | +| `NET_ADMIN` | Veth, route, and iptables setup for the inner sandbox namespace. | +| `SYS_PTRACE` | `/proc//exe` inspection and ancestor walking for binary identity. | +| `SYSLOG` | `/dev/kmsg` access for bypass diagnostics. | +| `DAC_READ_SEARCH` | Cross-UID `/proc//fd` reads needed by proxy process identity checks in rootless Podman. | -The restricted agent child loses these privileges before user code runs. +The driver intentionally keeps Podman's default `SETUID`, `SETGID`, `CHOWN`, +and `FOWNER` capabilities because the supervisor needs them to drop privileges +and prepare writable sandbox directories. It drops unneeded defaults such as +`DAC_OVERRIDE`, `FSETID`, `KILL`, `NET_BIND_SERVICE`, `NET_RAW`, `SETFCAP`, +`SETPCAP`, and `SYS_CHROOT`. -## Network Model +## Rootless Networking -The driver creates or reuses a Podman bridge network for container-to-host -communication. The agent child does not use that bridge directly. The supervisor -creates a nested namespace and routes agent egress through the local CONNECT -proxy. +Podman networking is a stack of cooperating projects: -`host.containers.internal` is used for callbacks to the host gateway. Rootless -networking may use pasta under the hood; avoid assumptions that require -container-to-container L2 reachability. +| Component | Role | +|---|---| +| Podman | Container runtime and lifecycle orchestration. | +| Netavark | Network setup, bridge creation, IPAM, and firewall rules. | +| aardvark-dns | DNS for Podman bridge networks when DNS is enabled. | +| pasta | User-mode host connectivity for common rootless networking paths. | + +Rootful bridge networking can create host bridges, veth pairs, and firewall +rules directly. Rootless Podman cannot create those host-level interfaces as an +unprivileged user, so common rootless deployments use pasta to translate traffic +between the rootless network namespace and host sockets. The driver does not +configure pasta directly. It asks Podman for bridge mode on the configured +network and logs the backend reported by Podman. + +The important operational constraint is that the Podman bridge address range is +not a reliable host-routable address in rootless mode. Sandbox callbacks to the +gateway should use `host.containers.internal`, `host.openshell.internal`, or an +explicit `OPENSHELL_GRPC_ENDPOINT`, not the container's bridge IP. + +## Network Layers + +Podman-backed sandboxes have three network layers: + +```text +Host + | + | Gateway listens on the configured bind address and port. + | Rootless Podman may use pasta for host/container translation. + | +Podman bridge network, default "openshell" + | + | Sandbox container default namespace. + | Supervisor, policy proxy, and relay client run here. + | +Inner sandbox network namespace + | + | Created by the supervisor with a veth pair. + | Agent processes run here as the sandbox user. +``` + +The driver creates or reuses the Podman bridge with DNS enabled. The supervisor +then creates the inner namespace, configures a veth pair, and routes ordinary +agent egress through the local CONNECT proxy. The proxy evaluates destination, +binary identity, SSRF protections, TLS/L7 rules, and inference interception. + +The supervisor uses `nsenter --net=` for namespace operations instead of +`ip netns exec` so rootless containers avoid the sysfs remount path that needs +real host `CAP_SYS_ADMIN`. + +## Data Paths + +Sandbox-to-gateway callbacks use the endpoint in `OPENSHELL_ENDPOINT`. When the +gateway did not configure one, the Podman driver builds it from the gateway +port and TLS state: + +- `http://host.containers.internal:` when sandbox mTLS is not configured. +- `https://host.containers.internal:` when all three sandbox TLS paths are + configured. + +Interactive sessions use the supervisor relay. The CLI opens a session with the +gateway, the gateway sends `RelayOpen` over the existing supervisor session, and +the supervisor opens a relay stream back to the gateway. The supervisor then +bridges that stream to the Unix socket at `OPENSHELL_SSH_SOCKET_PATH`, usually +`/run/openshell/ssh.sock`. Sandbox SSH does not require direct ingress to the +container. + +Agent outbound traffic stays separate. The agent process connects to the local +proxy in the inner namespace. If policy allows the request, the proxy opens the +upstream connection from the container namespace and Podman carries it out +through the configured rootless or rootful network backend. ## Secrets and Environment -The SSH handshake secret is injected with Podman's `secret_env` API rather than -as a plain inspectable environment value. Sandbox identity, callback endpoint, -relay socket path, and command metadata are driver-controlled environment -variables and must override template values. +The SSH handshake secret is created as a Podman secret and injected with the +libpod `secret_env` map. That keeps it out of `podman inspect`, although it is +still an environment variable visible to the supervisor process before the +supervisor scrubs it from child environments. + +The container environment is built in priority order: -When TLS is configured, the driver mounts the client bundle read-only and sets -the standard `OPENSHELL_TLS_*` environment variables for the supervisor. +1. Sandbox spec and template environment. +2. Driver-controlled values that always overwrite user-supplied values. +3. TLS client paths when sandbox mTLS is enabled. + +Driver-controlled values include: + +- `OPENSHELL_SANDBOX` +- `OPENSHELL_SANDBOX_ID` +- `OPENSHELL_ENDPOINT` +- `OPENSHELL_SSH_SOCKET_PATH` +- `OPENSHELL_SSH_HANDSHAKE_SKEW_SECS` +- `OPENSHELL_CONTAINER_IMAGE` +- `OPENSHELL_SANDBOX_COMMAND` + +Sandbox images and templates must not be allowed to spoof identity, callback, +relay, command metadata, or TLS path values. + +## TLS + +When all three Podman TLS paths are set, the driver treats sandbox callbacks as +mTLS callbacks: + +- `OPENSHELL_PODMAN_TLS_CA` +- `OPENSHELL_PODMAN_TLS_CERT` +- `OPENSHELL_PODMAN_TLS_KEY` + +The driver validates that these paths are provided as a complete set. Partial +configuration fails early instead of silently falling back to plaintext. + +When enabled, the files are mounted read-only into the container at: + +- `/etc/openshell/tls/client/ca.crt` +- `/etc/openshell/tls/client/tls.crt` +- `/etc/openshell/tls/client/tls.key` + +The driver also sets `OPENSHELL_TLS_CA`, `OPENSHELL_TLS_CERT`, and +`OPENSHELL_TLS_KEY` to those container-side paths. On SELinux systems, the bind +mounts include Podman's shared relabel option so the container process can read +the files. + +RPM installations generate a local PKI on first start and configure these paths +for the Podman driver. See `deploy/rpm/CONFIGURATION.md` for package-level +details. + +## Sandbox Lifecycle + +Create follows this order: + +1. Validate the sandbox name and ID, then validate the derived Podman resource + names before creating anything. +2. Pull or verify the supervisor image with the `missing` policy. +3. Pull or verify the sandbox image with `OPENSHELL_SANDBOX_IMAGE_PULL_POLICY`. +4. Create the Podman secret for the SSH handshake secret. +5. Create the workspace volume. +6. Create the container from the generated spec. +7. Start the container. + +Failures roll back resources created earlier in the flow. A container name +conflict removes the new sandbox's workspace volume and handshake secret because +those resources are keyed by sandbox ID, not by the conflicting container. + +Delete is idempotent: + +1. Validate the sandbox ID and derived container name. +2. Best-effort inspect the container and warn if its sandbox ID label differs. +3. Stop the container using the configured timeout. +4. Force-remove the container and attached anonymous volumes. +5. Remove the workspace volume derived from the request sandbox ID. +6. Remove the handshake secret derived from the request sandbox ID. + +If the container is already gone, the driver still attempts volume and secret +cleanup and returns that no container existed. + +## Readiness + +The container health check accepts any of these readiness signals: + +- The legacy marker file `/var/run/openshell-ssh-ready` exists. +- The configured supervisor Unix socket path exists and is a socket. +- Something listens on the configured in-container SSH TCP port. + +The Unix socket check is the preferred relay-only path. The TCP port mapping is +kept for compatibility with older readiness and debug flows. ## GPU Support -GPU sandboxes use CDI device injection when `spec.gpu` is true and NVIDIA CDI -devices are available. The sandbox image must still include the user-space -libraries required by the workload. +The Podman driver reports GPU support when `/dev/nvidia0` exists on the gateway +host. If a sandbox requests GPU support and that device is missing, validation +fails before container creation. + +When GPU support is requested, the container spec includes the CDI device +request `nvidia.com/gpu=all`. The host must have NVIDIA CDI specs available to +Podman, and the sandbox image must include user-space libraries required by the +workload. + +## Configuration + +The gateway configures the in-process driver from gateway settings and selected +environment variables. The standalone `openshell-driver-podman` binary exposes +the same fields as CLI flags and env vars. + +| Env var | Standalone flag | Default | Purpose | +|---|---|---|---| +| `OPENSHELL_PODMAN_SOCKET` | `--podman-socket` | Platform default socket path | Podman API Unix socket. | +| `OPENSHELL_SANDBOX_IMAGE` | `--sandbox-image` | Gateway default sandbox image | Fallback OCI image for sandboxes that do not specify one. | +| `OPENSHELL_SANDBOX_IMAGE_PULL_POLICY` | `--sandbox-image-pull-policy` | `missing` | Pull policy for sandbox images: `always`, `missing`, `never`, or `newer`. | +| `OPENSHELL_GRPC_ENDPOINT` | `--grpc-endpoint` | Auto-detected `host.containers.internal` URL | Callback endpoint injected into sandboxes. | +| `OPENSHELL_GATEWAY_PORT` | `--gateway-port` | `8080` | Gateway port used for endpoint auto-detection by the standalone binary. | +| `OPENSHELL_NETWORK_NAME` | `--network-name` | `openshell` | Podman bridge network name. | +| `OPENSHELL_SANDBOX_SSH_PORT` | `--sandbox-ssh-port` | `2222` | In-container SSH compatibility port. | +| `OPENSHELL_SANDBOX_SSH_SOCKET_PATH` | `--sandbox-ssh-socket-path` | `/run/openshell/ssh.sock` | Supervisor Unix socket path for relay traffic. | +| `OPENSHELL_SSH_HANDSHAKE_SECRET` | `--ssh-handshake-secret` | Gateway-generated or required standalone | Shared secret for the NSSH1 handshake. | +| `OPENSHELL_SSH_HANDSHAKE_SKEW_SECS` | `--ssh-handshake-skew-secs` | `300` | Allowed timestamp skew for SSH handshake validation. | +| `OPENSHELL_STOP_TIMEOUT` | `--stop-timeout` | `10` | Container stop timeout in seconds. | +| `OPENSHELL_SUPERVISOR_IMAGE` | `--supervisor-image` | `openshell/supervisor:latest` through the gateway, required standalone | OCI image that supplies `openshell-sandbox`. | +| `OPENSHELL_PODMAN_TLS_CA` | `--podman-tls-ca` | unset | Host path to the CA certificate mounted for sandbox mTLS. | +| `OPENSHELL_PODMAN_TLS_CERT` | `--podman-tls-cert` | unset | Host path to the client certificate mounted for sandbox mTLS. | +| `OPENSHELL_PODMAN_TLS_KEY` | `--podman-tls-key` | unset | Host path to the client private key mounted for sandbox mTLS. | + +## Operational Notes + +- Prefer explicit `OPENSHELL_GRPC_ENDPOINT` only when the auto-detected + `host.containers.internal` endpoint is not appropriate for the deployment. +- Keep the gateway bound to an address that sandbox containers can reach. RPM + deployments bind on `0.0.0.0` and rely on mTLS for access control. +- Avoid relying on Podman bridge IPs from the host in rootless deployments. + Use `host.containers.internal`, `host.openshell.internal`, published ports, or + the supervisor relay. +- Rootless networking behavior depends on the backend reported by Podman. The + driver logs that backend at startup for troubleshooting. +- For sandbox infrastructure changes, run the Podman e2e path and update this + README when the operator-facing contract changes. diff --git a/docs/reference/sandbox-compute-drivers.mdx b/docs/reference/sandbox-compute-drivers.mdx index 9f07c1e40..40c0b8223 100644 --- a/docs/reference/sandbox-compute-drivers.mdx +++ b/docs/reference/sandbox-compute-drivers.mdx @@ -38,6 +38,8 @@ Common gateway options: The gateway talks to the Docker daemon to create sandbox containers. Docker is also required for local image builds from directories or Dockerfiles. +For maintainer-level implementation details, refer to the [Docker driver README](https://github.com/NVIDIA/OpenShell/blob/main/crates/openshell-driver-docker/README.md). + | Option | Environment variable | Description | |---|---|---| | `--drivers docker` | `OPENSHELL_DRIVERS=docker` | Select the Docker compute driver. | @@ -54,6 +56,8 @@ For GPU-backed Docker sandboxes, configure Docker CDI before starting the gatewa The gateway talks to the Podman API socket. The Podman driver requires Podman 5.x, cgroups v2, rootless networking, and an active Podman user socket. +For maintainer-level implementation details, refer to the [Podman driver README](https://github.com/NVIDIA/OpenShell/blob/main/crates/openshell-driver-podman/README.md). + | Option | Environment variable | Description | |---|---|---| | `--drivers podman` | `OPENSHELL_DRIVERS=podman` | Select the Podman compute driver. | @@ -69,6 +73,8 @@ MicroVM-backed sandboxes run inside VM-backed isolation instead of a container b The gateway uses the VM compute driver to create VM-backed sandboxes. MicroVM requires host virtualization support. It uses [libkrun](https://github.com/containers/libkrun) with Apple's [Hypervisor framework](https://developer.apple.com/documentation/hypervisor) on macOS, KVM on Linux, and [QEMU](https://www.qemu.org/) for GPU-backed sandboxes on Linux. +For maintainer-level implementation details, refer to the [VM driver README](https://github.com/NVIDIA/OpenShell/blob/main/crates/openshell-driver-vm/README.md). + | Option | Environment variable | Description | |---|---|---| | `--drivers vm` | `OPENSHELL_DRIVERS=vm` | Select the VM compute driver. VM is never auto-detected. | @@ -85,6 +91,8 @@ Kubernetes-backed sandboxes run as pods in the configured sandbox namespace. Use Helm deployments set Kubernetes driver values through the chart. +For maintainer-level implementation details, refer to the [Kubernetes driver README](https://github.com/NVIDIA/OpenShell/blob/main/crates/openshell-driver-kubernetes/README.md). + | Gateway option | Environment variable | Helm value | Description | |---|---|---|---| | `--drivers kubernetes` | `OPENSHELL_DRIVERS=kubernetes` | Not applicable | Select the Kubernetes compute driver. | From c87224985570650cc1d636302f1bea5ebe987140 Mon Sep 17 00:00:00 2001 From: Drew Newberry Date: Thu, 7 May 2026 12:05:30 -0700 Subject: [PATCH 2/3] docs(podman): split networking architecture notes Signed-off-by: Drew Newberry --- crates/openshell-driver-podman/NETWORKING.md | 436 +++++++++++++++ crates/openshell-driver-podman/README.md | 556 ++++++++++--------- docs/reference/sandbox-compute-drivers.mdx | 2 +- 3 files changed, 730 insertions(+), 264 deletions(-) create mode 100644 crates/openshell-driver-podman/NETWORKING.md diff --git a/crates/openshell-driver-podman/NETWORKING.md b/crates/openshell-driver-podman/NETWORKING.md new file mode 100644 index 000000000..1767927db --- /dev/null +++ b/crates/openshell-driver-podman/NETWORKING.md @@ -0,0 +1,436 @@ +# Rootless Podman Networking + +Deep-dive into how networking works in the Podman compute driver when running +rootless with pasta as the network backend. Covers the external tooling +(Podman, Netavark, pasta, aardvark-dns), the three nested namespace layers, and +the complete data paths for SSH, outbound traffic, and supervisor-to-gateway +communication. + +For the general Podman driver architecture, lifecycle, API surface, and driver +comparison, see [README.md](README.md). + +## Component Stack + +Podman's networking is composed of four independent projects: + +| Component | Language | Role | +|---|---|---| +| Podman | Go | Container runtime; orchestrates network lifecycle. | +| Netavark | Rust | Network backend; creates interfaces, bridges, firewall rules. | +| aardvark-dns | Rust | Authoritative DNS server for container name resolution. | +| pasta, part of passt | C | User-mode networking; L2-to-L4 socket translation for rootless containers. | + +The key split: rootful containers default to Netavark bridge networking with +real kernel interfaces, while rootless containers commonly use pasta user-mode +networking without needing host privileges. + +## How Netavark Works + +Netavark is invoked by Podman as an external binary. It reads a JSON network +configuration from STDIN and executes one of three commands: + +- `netavark setup ` creates interfaces, assigns IPs, and sets up + firewall rules for NAT and port-forwarding. +- `netavark teardown ` reverses setup and removes interfaces and + firewall rules. +- `netavark create` takes a partial network config and completes it by + assigning subnets and gateways. + +For rootful bridge networking: + +1. Podman creates a network namespace for the container. +2. Podman invokes `netavark setup` with the network config JSON. +3. Netavark creates a bridge, such as `podman0`, if it does not exist. The + default subnet is `10.88.0.0/16`. +4. Netavark creates a veth pair. One end goes into the container's netns and + the other attaches to the bridge. +5. Netavark assigns an IP from the subnet to the container's veth interface. +6. Netavark configures iptables or nftables rules for masquerade and port + mappings. +7. Netavark starts aardvark-dns when DNS is enabled, listening on the bridge + gateway address. + +```text +Host Kernel + | + +-- Bridge interface, such as "podman0" + | | + | +-- veth pair endpoint, host side, container 1 + | +-- veth pair endpoint, host side, container 2 + | + +-- Host physical interface, such as eth0 + | + +-- NAT, iptables or nftables rules managed by Netavark +``` + +Netavark also supports macvlan networks, where the container gets a +sub-interface of a physical host NIC with its own MAC address, and external +plugins via a documented JSON API. + +## How Pasta Works + +Unprivileged users cannot create network interfaces on the host. They cannot +create veth pairs, bridges, or iptables rules. Netavark's bridge approach +cannot work directly for rootless containers without an additional rootless +networking layer. + +Pasta, part of the `passt` project, operates in userspace and translates +between the container's L2 TAP interface and the host's L4 sockets. It requires +no capabilities or privileges. + +```text +Container Network Namespace + | + +-- TAP device, such as "eth0" + | ^ + | | L2 frames, Ethernet + | v + +-- pasta process, userspace + | + | Translation: L2 frames <-> L4 sockets + | + v + Host Network Stack, native TCP/UDP/ICMP sockets +``` + +For an outbound TCP connection from a container: + +1. The application calls `connect()` to an external address. +2. The kernel routes the packet through the default gateway to the TAP device. +3. Pasta reads the raw Ethernet frame from the TAP file descriptor. +4. Pasta parses L2/L3/L4 headers and identifies the TCP SYN. +5. Pasta opens a native TCP socket on the host and calls `connect()` to the + same destination. +6. When the host socket connects, pasta reflects the SYN-ACK back through the + TAP as an L2 frame. +7. For ongoing data transfer, pasta translates between TAP frames and the host + socket, coordinating TCP windows and acknowledgments between the two sides. + +Pasta does not maintain per-connection packet buffers. It reflects observed +sending windows and ACKs directly between peers. This is a thinner translation +layer than a full TCP/IP stack. + +### Built-in Services + +Pasta includes minimal network services so the container stack can +auto-configure: + +| Service | Purpose | +|---|---| +| ARP proxy | Resolves the gateway address to the host's MAC address. | +| DHCP server | Hands out a single IPv4 address, usually matching the host's upstream interface. | +| NDP proxy | Handles IPv6 neighbor discovery and SLAAC prefix advertisement. | +| DHCPv6 server | Hands out a single IPv6 address, usually matching the host's upstream interface. | + +By default there is no NAT. Pasta copies the host's IP addresses into the +container namespace. + +### Local Connection Bypass + +For connections between the container and the host, pasta implements a local +bypass path: + +- Packets with a local destination skip L2 translation. +- TCP uses `splice(2)`. +- UDP uses `recvmmsg(2)` and `sendmmsg(2)`. + +### Port Forwarding + +By default, pasta uses auto-detection. It scans `/proc/net/tcp` and +`/proc/net/tcp6` periodically and automatically forwards ports that are bound +and listening. Port forwarding is configurable through pasta options. + +### Security Properties + +Pasta is designed for rootless use: + +- No dynamic memory allocation after startup. +- All capabilities dropped, except `CAP_NET_BIND_SERVICE` when granted. +- Restrictive seccomp profile. +- Detaches into its own user, mount, IPC, UTS, and PID namespaces. +- No external dependencies beyond libc. + +### Inter-Container Limitation + +Unlike bridge networking, pasta containers are isolated from each other by +default. No virtual bridge connects them. Communication requires port mappings +through the host, pods with a shared network namespace, or opting into rootless +Netavark bridge networking with `podman network create`. + +## Three Nested Namespaces + +The Podman compute driver creates three layers of network isolation: + +```text +Namespace 1: Host + | + pasta manages port forwarding, such as 127.0.0.1: + gateway listens on its configured bind address and port + | +Namespace 2: Rootless Podman network namespace, managed by pasta + | + Bridge "openshell", often 10.89.x.0/24 + aardvark-dns for container name resolution + | + Container netns + supervisor, proxy, and relay client run here + | +Namespace 3: Inner sandbox netns, created by supervisor + | + veth pair, such as 10.200.0.1 <-> 10.200.0.2 + iptables forces ordinary traffic through proxy + user workload runs here +``` + +Pasta bridges namespace 1 and 2. The veth pair bridges namespace 2 and 3. The +proxy at the boundary of namespace 2 and 3 enforces network policy. + +### Layer 1 Pasta + +At driver startup, the driver ensures a Podman bridge network exists: + +```rust +client.ensure_network(&config.network_name).await?; +``` + +This creates a bridge network named `openshell` by default, with DNS enabled. +In rootless mode, this bridge can exist inside a user namespace managed by +pasta. The bridge IP range is not reliably routable from the host. + +```text +Host + | + 127.0.0.1:, pasta binds this on the host + | + pasta process, translates L4 sockets <-> L2 TAP frames + | + rootless network namespace + | + Bridge "openshell", such as 10.89.1.0/24 + | + +-- 10.89.1.1, bridge gateway and aardvark-dns + | + +-- veth to container netns + | + 10.89.1.2, container IP +``` + +### Layer 2 Container Networking + +The container spec configures: + +- `nsmode: "bridge"` to use the Podman bridge network. +- `networks` to attach to the configured bridge, `openshell` by default. +- `portmappings` with `host_port: 0`, `container_port: 2222`, and `protocol: + "tcp"` to publish the SSH compatibility port on an ephemeral host port. +- `hostadd` entries for `host.containers.internal:host-gateway` and + `host.openshell.internal:host-gateway`. + +Pasta is not explicitly configured by the driver. The driver requests bridge +mode and logs the network backend that Podman reports at startup. + +The `host.containers.internal` hostname is injected into `/etc/hosts` so the +supervisor can reach the gateway on the host. If `OPENSHELL_GRPC_ENDPOINT` is +empty, the driver auto-detects: + +```rust +if config.grpc_endpoint.is_empty() { + let scheme = if config.tls_enabled() { + "https" + } else { + "http" + }; + config.grpc_endpoint = + format!("{scheme}://host.containers.internal:{}", config.gateway_port); +} +``` + +The bridge gateway IP is not a stable substitute in rootless mode because it +can live inside the user namespace rather than on the host. + +### Layer 3 Inner Sandbox Network Namespace + +Inside the container, the supervisor creates another network namespace for the +user workload: + +```text +Container on the Podman bridge + | + Supervisor process, running in container's default netns + | + +-- Proxy listener at the inner namespace gateway address + | + +-- veth pair + | + +-- Inner network namespace + | + sandbox-side veth address + | + default route -> supervisor-side veth address + | + user code runs here + | + iptables rules: + ACCEPT -> proxy TCP + ACCEPT -> loopback + ACCEPT -> established/related + LOG -> TCP SYN bypass attempts + REJECT -> TCP + LOG -> UDP bypass attempts + REJECT -> UDP +``` + +The supervisor uses `nsenter --net=` rather than `ip netns exec` to avoid sysfs +remount issues that arise under rootless Podman where real host +`CAP_SYS_ADMIN` is unavailable. + +A tmpfs is mounted at `/run/netns` in the container spec so the supervisor can +create named network namespaces. In rootless Podman this directory does not +exist on the host, so a private tmpfs gives the supervisor its own writable +`/run/netns` without needing host filesystem access. + +## Complete Data Paths + +### SSH Session + +```text +Client, openshell CLI + | + 1. gRPC: CreateSshSession -> gateway, returns token and connect_path + 2. HTTP CONNECT /connect/ssh to gateway + headers: x-sandbox-id, x-sandbox-token + | +Gateway + | + 3. Looks up SupervisorSession for sandbox_id + 4. Sends RelayOpen{channel_id} over ConnectSupervisor bidi stream + | + gRPC traverses host -> pasta translation -> container bridge + | +Supervisor inside container + | + 5. Receives RelayOpen, opens new RelayStream RPC back to gateway + 6. Sends RelayInit{channel_id} on the stream + 7. Connects to Unix socket /run/openshell/ssh.sock + 8. Bidirectional bridge: RelayStream <-> Unix socket + | +SSH daemon inside container, Unix socket only + | + 9. Authenticates. Access is gated by the relay chain. + 10. Spawns shell process + 11. Shell enters inner netns via setns(fd, CLONE_NEWNET) + | +User shell in sandbox netns +``` + +The SSH daemon listens on a Unix socket with restrictive permissions. The +published TCP port mapping exists in the container spec for compatibility and +health/debug paths. Normal SSH communication uses the gRPC reverse-connect relay +pattern. + +### Outbound HTTP Request + +```text +User code in inner netns + | + 1. curl https://api.example.com + HTTP_PROXY points at the local sandbox proxy + | + 2. TCP connect to proxy + allowed by iptables as the only ordinary egress destination + | + 3. HTTP CONNECT api.example.com:443 + | +Supervisor proxy in container netns + | + 4. Policy evaluation with process identity + 5. SSRF check + 6. Optional L7 TLS intercept and HTTP method/path inspection + | + 7. If allowed, TCP connect to api.example.com:443 + from the container netns + | + 8. Through Podman bridge -> pasta -> host -> internet +``` + +### Supervisor gRPC Callback + +The Podman driver auto-detects the callback endpoint scheme based on whether +TLS client certificates are configured. When the RPM's auto-generated PKI is in +place, the endpoint is `https://host.containers.internal:8080` and the +supervisor connects with mTLS. Without TLS configuration, it falls back to +`http://host.containers.internal:8080`. + +```text +Supervisor in container netns + | + 1. Connects to host.containers.internal: + with mTLS when OPENSHELL_TLS_* paths are set + | + 2. Routed through container default gateway + | + 3. Pasta translates L2 frame -> host L4 socket when rootless backend uses pasta + | + 4. Host TCP socket connects to gateway + | +Gateway + | + 5. TLS handshake when enabled + 6. ConnectSupervisor bidirectional stream established + 7. Heartbeats at the interval accepted by the gateway + 8. Reconnects with exponential backoff on failure + 9. Same gRPC channel reused for RelayStream calls +``` + +The gateway binds to `0.0.0.0` by default in the RPM packaging. mTLS prevents +unauthenticated access even though the gateway is reachable from the network. +Client certificates are auto-generated by `init-pki.sh` on first start and +bind-mounted into sandbox containers by the Podman driver. + +## Differences from the Kubernetes Driver + +| Aspect | Kubernetes | Podman, rootless pasta | +|---|---|---| +| Container or pod IP | Routable cluster-wide | Non-routable from the host in common rootless setups. | +| Network reachability | Pod IPs reachable from gateway | Bridge not reliably routable from host; requires host aliases or published ports. | +| Sandbox to gateway | Direct TCP to Kubernetes service or endpoint | `host.containers.internal` through bridge and rootless backend. | +| SSH transport | Reverse gRPC relay | Reverse gRPC relay. | +| Port publishing | Not needed for relay | Ephemeral host port remains in the container spec for compatibility and debug paths. | +| TLS | mTLS via Kubernetes secrets | mTLS via mounted client files, RPM defaults, or explicit configuration. | +| DNS | Kubernetes CoreDNS | Podman bridge DNS through aardvark-dns when DNS is enabled. | +| Network policy | Kubernetes network policy for pod ingress plus supervisor policy | iptables inside inner sandbox netns plus supervisor policy. | +| Supervisor delivery | Kubernetes driver managed pod image or template | OCI image volume mount. | +| Secrets | Kubernetes Secret volume and env vars | Podman `secret_env` for handshake secret, plus mounted TLS files. | + +Both drivers use the same reverse gRPC relay for SSH transport. The most +important Podman-specific difference is network reachability: in rootless +Podman, the bridge network is not reliably routable from the host, so +host-to-container and container-to-host communication must use host aliases, +published ports, or the supervisor relay. + +## Port Assignments + +| Port | Component | Purpose | +|---|---|---| +| `8080` | Gateway | gRPC and HTTP multiplexed default server port. | +| `2222` | Sandbox | Container port mapping default for the SSH compatibility port. | +| `3128` | Sandbox proxy | HTTP CONNECT proxy inside the sandbox network model. | +| `0` | Host | Ephemeral host port requested for the container SSH compatibility port. | + +## Key Source Files + +| File | What it controls | +|---|---| +| `crates/openshell-driver-podman/src/driver.rs` | Bridge network creation, gRPC endpoint auto-detection, rootless checks. | +| `crates/openshell-driver-podman/src/container.rs` | Container spec: network mode, port mappings, host aliases, tmpfs, capabilities. | +| `crates/openshell-driver-podman/src/client.rs` | Podman REST API calls for network ensure/inspect, port discovery, and events. | +| `crates/openshell-driver-podman/src/config.rs` | Network name, socket path, SSH port, gateway port defaults. | +| `crates/openshell-sandbox/src/sandbox/linux/netns.rs` | Inner network namespace, veth pair, IP addressing, iptables rules. | +| `crates/openshell-sandbox/src/proxy.rs` | HTTP CONNECT proxy, OPA policy, SSRF protection, L7 inspection. | +| `crates/openshell-sandbox/src/ssh.rs` | SSH daemon on Unix socket and shell process netns entry via `setns()`. | +| `crates/openshell-sandbox/src/supervisor_session.rs` | gRPC `ConnectSupervisor` stream and `RelayStream` for SSH tunneling. | +| `crates/openshell-sandbox/src/grpc_client.rs` | gRPC channel to gateway with mTLS or plaintext, keep-alive, and reconnect behavior. | +| `crates/openshell-server/src/ssh_tunnel.rs` | Gateway-side SSH tunnel, HTTP CONNECT endpoint, relay bridging. | +| `crates/openshell-server/src/supervisor_session.rs` | `SupervisorSessionRegistry`, relay claim/open lifecycle. | +| `crates/openshell-server/src/compute/mod.rs` | `ComputeRuntime::new_podman()` driver initialization. | +| `crates/openshell-core/src/config.rs` | Default constants for ports and network names. | diff --git a/crates/openshell-driver-podman/README.md b/crates/openshell-driver-podman/README.md index f5c856928..a2c65feed 100644 --- a/crates/openshell-driver-podman/README.md +++ b/crates/openshell-driver-podman/README.md @@ -1,132 +1,91 @@ # openshell-driver-podman -Podman-backed compute driver for rootless and single-machine OpenShell -deployments. +The Podman compute driver manages sandbox containers via the Podman REST API +over a Unix socket. It targets single-machine and developer environments where +rootless container isolation is preferred over a full Kubernetes cluster. The +driver runs in-process within the gateway server and delegates all sandbox +isolation enforcement to the `openshell-sandbox` supervisor binary, which is +sideloaded into each container via an OCI image volume mount. -The driver talks to the Podman libpod REST API over a Unix socket. The gateway -usually constructs it in-process, while the crate also ships an -`openshell-driver-podman` binary that exposes the shared compute-driver gRPC -surface for standalone use and tests. Each sandbox is one Podman container, and -the `openshell-sandbox` supervisor inside that container owns the actual agent -isolation. +For a rootless networking deep dive, see [NETWORKING.md](NETWORKING.md). -## Source Map +## Source File Index All paths are relative to `crates/openshell-driver-podman/src/`. | File | Purpose | |---|---| | `lib.rs` | Crate root and public re-exports. | -| `main.rs` | Standalone driver binary, CLI/env parsing, and gRPC server startup. | -| `driver.rs` | Sandbox lifecycle, image pulls, network setup, endpoint detection, GPU checks, and rootless preflight checks. | -| `client.rs` | Async HTTP/1.1 client for Podman libpod APIs over a Unix socket. | -| `container.rs` | Podman container spec construction, environment ownership, labels, resources, capabilities, mounts, health checks, port mappings, secrets, and CDI devices. | -| `config.rs` | `PodmanComputeConfig`, image pull policy parsing, default socket paths, TLS validation, and redacted debug output. | -| `grpc.rs` | Tonic service adapter from the compute-driver protobuf API to the Rust driver methods. | -| `watcher.rs` | Initial state sync and live Podman event stream mapping into gateway watch events. | - -## Runtime Model +| `main.rs` | Standalone binary entrypoint, CLI/env parsing, driver construction, and gRPC server startup. | +| `driver.rs` | Core `PodmanComputeDriver`: sandbox lifecycle, image pulls, endpoint detection, GPU checks, rootless preflight checks, and bridge network setup. | +| `client.rs` | `PodmanClient`: async HTTP/1.1 client over a Unix socket for the Podman libpod REST API. | +| `container.rs` | Container spec construction: labels, environment, resource limits, capabilities, seccomp config, health checks, port mappings, image volumes, TLS mounts, and secret injection. | +| `config.rs` | `PodmanComputeConfig`, `ImagePullPolicy`, default socket path resolution, TLS validation, and redacted `Debug` output. | +| `grpc.rs` | `ComputeDriverService`: tonic gRPC service mapping compute driver RPCs to driver methods. | +| `watcher.rs` | Watch stream: initial state sync via container list, then live Podman events mapped to `WatchSandboxesEvent` protobuf messages. | + +## Architecture + +The Podman driver communicates with the Podman daemon over a Unix socket and +delegates sandbox isolation to the supervisor binary running inside each +container. ```mermaid -flowchart LR - GW["Gateway"] -->|"in-process driver"| D["PodmanComputeDriver"] - D -->|"HTTP over Unix socket"| P["Podman API"] - P --> C["Sandbox container"] - C -->|"entrypoint"| S["openshell-sandbox supervisor"] - S -->|"nested netns + policy proxy"| A["restricted agent child"] - S -.->|"supervisor relay"| GW +graph TB + CLI["openshell CLI"] -->|gRPC| GW["Gateway Server
(openshell-server)"] + GW -->|in-process| PD["PodmanComputeDriver"] + PD -->|HTTP/1.1
Unix socket| PA["Podman API"] + PA -->|OCI runtime
crun/runc| C["Sandbox Container"] + C -->|image volume
read-only| SV["Supervisor Binary
/opt/openshell/bin/openshell-sandbox"] + SV -->|creates| NS["Nested Network Namespace
veth pair + proxy"] + SV -->|enforces| LL["Landlock + seccomp"] + SV -->|gRPC callback| GW ``` -The container is the outer runtime boundary. Inside it, the supervisor creates a -nested network namespace, starts the CONNECT policy proxy, applies -Landlock/seccomp controls, opens the supervisor relay back to the gateway, and -launches agent commands as the unprivileged sandbox user. - -The driver configures container runtime details only. It does not enforce -OpenShell filesystem, process, network, inference, or credential policy itself. -Those controls stay in `openshell-sandbox` so Podman, Docker, Kubernetes, and VM -runtimes share the same sandbox contract. - -## Driver Comparison +### Driver Comparison | Aspect | Kubernetes | Docker | VM | Podman | |---|---|---|---|---| -| Driver shape | In-process | In-process | Gateway-spawned subprocess | In-process, with standalone binary support | -| Backend | Kubernetes API | Docker daemon | libkrun and gvproxy | Podman libpod REST API over UDS | -| Outer boundary | Pod | Container | MicroVM | Container | -| Supervisor delivery | Supervisor image or init copy into pod volume | Extracted or mounted supervisor binary | Embedded guest bundle | Read-only OCI image volume | -| Callback path | Pod to gateway service or endpoint | Host networking | gvproxy host-loopback NAT | `host.containers.internal` or explicit endpoint | -| SSH transport | Supervisor relay | Supervisor relay | Supervisor relay | Supervisor relay | +| Execution model | In-process | In-process | Standalone subprocess over gRPC UDS | In-process, with standalone binary support | +| Backend | Kubernetes API | Docker daemon | libkrun and gvproxy | Podman REST API over Unix socket | +| Isolation boundary | Pod plus nested sandbox namespace | Container plus nested sandbox namespace | Per-sandbox microVM | Container plus nested sandbox namespace | +| Supervisor delivery | Supervisor image or init copy into pod volume | Extracted or mounted supervisor binary | Embedded guest bundle | OCI image volume, read-only | +| Network model | Supervisor creates netns inside pod | Host networking plus nested netns | gvproxy virtio-net | Podman bridge plus nested netns | +| Credential injection | Environment and Kubernetes Secret volume | Environment and mounted TLS bundle | Guest rootfs copy and environment | Podman `secret_env`, environment, and mounted TLS bundle | | GPU support | `nvidia.com/gpu` resource | CDI when daemon supports it | Experimental VFIO path | CDI device request when NVIDIA devices exist | -| State owner | Kubernetes API | Docker daemon | Driver state dir | Podman daemon | - -## Startup Checks - -`PodmanComputeDriver::new` validates the host before accepting sandbox work: - -- Verifies the configured Podman socket path exists, then pings `/_ping`. -- Fetches `/libpod/info` and rejects cgroups v1 because rootless Podman needs - cgroups v2. -- Logs the Podman network backend and whether Podman reports rootless mode. -- Warns when the current user appears to lack `/etc/subuid` or `/etc/subgid` - ranges. This is not a hard failure because some systems provide subordinate - IDs through directory services. -- Creates or reuses the configured bridge network with DNS enabled. -- Auto-detects the sandbox callback endpoint when `OPENSHELL_GRPC_ENDPOINT` is - unset. +| State storage | Kubernetes API | Docker daemon | Driver state dir | Podman daemon | -The default socket path is `$XDG_RUNTIME_DIR/podman/podman.sock` on Linux, with -`/run/user//podman/podman.sock` as the fallback. On macOS it is -`$HOME/.local/share/containers/podman/machine/podman.sock`. - -## Supervisor Delivery - -Podman uses an OCI image volume to mount the supervisor image read-only at -`/opt/openshell/bin`. The supervisor image target in -`deploy/docker/Dockerfile.images` copies the `openshell-sandbox` binary to -`/openshell-sandbox`; mounting that image at `/opt/openshell/bin` makes the -binary available as `/opt/openshell/bin/openshell-sandbox`. - -The container spec sets that binary as the entrypoint. This avoids relying on -the sandbox image entrypoint or command, which might otherwise append the -supervisor path as an argument to an image-provided shell. +## Isolation Model -This model keeps the supervisor outside the mutable sandbox image without using -a hostPath-style bind mount. +The Podman driver provides the same protection layers as the other compute +drivers. The driver itself does not implement isolation primitives directly. It +configures the container so that the `openshell-sandbox` supervisor can enforce +them at runtime. -## Container Contract +### Container Security Configuration -The generated libpod create spec sets security-critical fields directly and -lets driver-owned values override template values. +The container spec in `container.rs` sets these security-critical fields: -| Setting | Value | Purpose | +| Setting | Value | Rationale | |---|---|---| -| `user` | `0:0` | The supervisor starts as root inside the container so it can create namespaces, configure mounts, and install sandbox controls. | -| `entrypoint` | `/opt/openshell/bin/openshell-sandbox` | Runs the supervisor directly regardless of the sandbox image entrypoint. | -| `volumes` | Named volume mounted at `/sandbox` | Provides the sandbox workspace. | -| `image_volumes` | Supervisor image mounted read-only at `/opt/openshell/bin` | Sideloads the supervisor binary. | -| `netns` | `bridge` | Attaches the container to the configured Podman bridge network. | -| `portmappings` | Container SSH port to host port `0` | Requests an ephemeral host port for compatibility and health/debug paths. | -| `hostadd` | `host.containers.internal` and `host.openshell.internal` to `host-gateway` | Gives containers stable names for services on the gateway host. | -| `mounts` | Private tmpfs at `/run/netns` | Lets the supervisor create named network namespaces under rootless Podman. | -| `no_new_privileges` | `true` | Prevents privilege escalation through exec. | -| `seccomp_profile_path` | `unconfined` | Avoids Podman's container-level profile blocking Landlock/seccomp setup before the supervisor installs its own policy-aware filter. | +| `user` | `0:0` | The supervisor needs root inside the container for namespace creation, proxy setup, Landlock, seccomp, and filesystem preparation. | +| `cap_drop` | Selected unneeded defaults | Podman's default capability set is already restricted. The driver drops capabilities the supervisor does not need. | +| `cap_add` | `SYS_ADMIN`, `NET_ADMIN`, `SYS_PTRACE`, `SYSLOG`, `DAC_READ_SEARCH` | Grants supervisor-only capabilities required for namespace setup, process identity, and bypass diagnostics. | +| `no_new_privileges` | `true` | Prevents privilege escalation after exec. | +| `seccomp_profile_path` | `unconfined` | The supervisor installs its own policy-aware BPF filter. A container-level profile can block Landlock/seccomp syscalls during setup. | +| `mounts` | Private tmpfs at `/run/netns` | Lets the supervisor create named network namespaces in rootless Podman. | -The agent child loses the supervisor's privileges before user code runs. +The restricted agent child does not retain these supervisor privileges. -## Capabilities - -Podman's default container capability set is restricted. The driver drops -capabilities the supervisor does not need and adds the extra ones required for -OpenShell isolation. +### Capability Breakdown | Capability | Purpose | |---|---| -| `SYS_ADMIN` | Namespace creation, Landlock setup, and seccomp filter installation. | -| `NET_ADMIN` | Veth, route, and iptables setup for the inner sandbox namespace. | -| `SYS_PTRACE` | `/proc//exe` inspection and ancestor walking for binary identity. | -| `SYSLOG` | `/dev/kmsg` access for bypass diagnostics. | -| `DAC_READ_SEARCH` | Cross-UID `/proc//fd` reads needed by proxy process identity checks in rootless Podman. | +| `SYS_ADMIN` | seccomp filter installation, namespace creation, and Landlock setup. | +| `NET_ADMIN` | Network namespace veth setup, IP address assignment, routes, and iptables. | +| `SYS_PTRACE` | Reading `/proc//exe` and walking process ancestry for binary identity. | +| `SYSLOG` | Reading `/dev/kmsg` for bypass-detection diagnostics. | +| `DAC_READ_SEARCH` | Reading `/proc//fd/` across UIDs so the proxy can resolve the binary responsible for a connection. | The driver intentionally keeps Podman's default `SETUID`, `SETGID`, `CHOWN`, and `FOWNER` capabilities because the supervisor needs them to drop privileges @@ -134,95 +93,150 @@ and prepare writable sandbox directories. It drops unneeded defaults such as `DAC_OVERRIDE`, `FSETID`, `KILL`, `NET_BIND_SERVICE`, `NET_RAW`, `SETFCAP`, `SETPCAP`, and `SYS_CHROOT`. -## Rootless Networking +## Supervisor Sideloading -Podman networking is a stack of cooperating projects: +The supervisor binary is delivered to sandbox containers via Podman's OCI image +volume mechanism, distinct from both the Kubernetes pod-volume approach and the +VM's embedded guest bundle. -| Component | Role | -|---|---| -| Podman | Container runtime and lifecycle orchestration. | -| Netavark | Network setup, bridge creation, IPAM, and firewall rules. | -| aardvark-dns | DNS for Podman bridge networks when DNS is enabled. | -| pasta | User-mode host connectivity for common rootless networking paths. | - -Rootful bridge networking can create host bridges, veth pairs, and firewall -rules directly. Rootless Podman cannot create those host-level interfaces as an -unprivileged user, so common rootless deployments use pasta to translate traffic -between the rootless network namespace and host sockets. The driver does not -configure pasta directly. It asks Podman for bridge mode on the configured -network and logs the backend reported by Podman. - -The important operational constraint is that the Podman bridge address range is -not a reliable host-routable address in rootless mode. Sandbox callbacks to the -gateway should use `host.containers.internal`, `host.openshell.internal`, or an -explicit `OPENSHELL_GRPC_ENDPOINT`, not the container's bridge IP. - -## Network Layers - -Podman-backed sandboxes have three network layers: - -```text -Host - | - | Gateway listens on the configured bind address and port. - | Rootless Podman may use pasta for host/container translation. - | -Podman bridge network, default "openshell" - | - | Sandbox container default namespace. - | Supervisor, policy proxy, and relay client run here. - | -Inner sandbox network namespace - | - | Created by the supervisor with a veth pair. - | Agent processes run here as the sandbox user. +```mermaid +sequenceDiagram + participant D as PodmanComputeDriver + participant P as Podman API + participant C as Sandbox Container + + D->>P: pull_image(supervisor, "missing") + D->>P: create_container(spec with image_volumes) + Note over P: Podman resolves image_volumes at
libpod layer before OCI spec generation + P->>C: Mount supervisor image at /opt/openshell/bin (read-only) + D->>P: start_container + C->>C: entrypoint: /opt/openshell/bin/openshell-sandbox ``` -The driver creates or reuses the Podman bridge with DNS enabled. The supervisor -then creates the inner namespace, configures a veth pair, and routes ordinary -agent egress through the local CONNECT proxy. The proxy evaluates destination, -binary identity, SSRF protections, TLS/L7 rules, and inference interception. +The `supervisor` target in `deploy/docker/Dockerfile.images` copies the +`openshell-sandbox` binary to `/openshell-sandbox` in the supervisor image. +Mounting that image at `/opt/openshell/bin` makes the binary available as +`/opt/openshell/bin/openshell-sandbox`. + +The container spec sets that binary as the entrypoint. This avoids relying on +the sandbox image entrypoint or command, which might otherwise append the +supervisor path as an argument to an image-provided shell. + +## TLS + +When all three Podman TLS paths are set, the driver treats sandbox callbacks as +mTLS callbacks: -The supervisor uses `nsenter --net=` for namespace operations instead of -`ip netns exec` so rootless containers avoid the sysfs remount path that needs -real host `CAP_SYS_ADMIN`. +- `OPENSHELL_PODMAN_TLS_CA` +- `OPENSHELL_PODMAN_TLS_CERT` +- `OPENSHELL_PODMAN_TLS_KEY` -## Data Paths +The driver validates that the TLS paths are provided as a complete set. Partial +configuration fails early instead of silently falling back to plaintext. -Sandbox-to-gateway callbacks use the endpoint in `OPENSHELL_ENDPOINT`. When the -gateway did not configure one, the Podman driver builds it from the gateway -port and TLS state: +When enabled, the driver: -- `http://host.containers.internal:` when sandbox mTLS is not configured. -- `https://host.containers.internal:` when all three sandbox TLS paths are - configured. +1. Switches the auto-detected endpoint scheme from `http://` to `https://`. +2. Bind-mounts the client cert files read-only into the container at + `/etc/openshell/tls/client/`. +3. Sets `OPENSHELL_TLS_CA`, `OPENSHELL_TLS_CERT`, and `OPENSHELL_TLS_KEY` to + the container-side paths. -Interactive sessions use the supervisor relay. The CLI opens a session with the -gateway, the gateway sends `RelayOpen` over the existing supervisor session, and -the supervisor opens a relay stream back to the gateway. The supervisor then -bridges that stream to the Unix socket at `OPENSHELL_SSH_SOCKET_PATH`, usually -`/run/openshell/ssh.sock`. Sandbox SSH does not require direct ingress to the -container. +The supervisor reads these env vars and uses them to establish an mTLS +connection back to the gateway. On SELinux systems, the bind mounts include +Podman's shared relabel option so the container process can read the files. -Agent outbound traffic stays separate. The agent process connects to the local -proxy in the inner namespace. If policy allows the request, the proxy opens the -upstream connection from the container namespace and Podman carries it out -through the configured rootless or rootful network backend. +The RPM packaging auto-generates a self-signed PKI on first start via +`init-pki.sh`. Client certs are placed in the CLI auto-discovery directory +(`~/.config/openshell/gateways/openshell/mtls/`) so the CLI connects with mTLS +without manual configuration. See `deploy/rpm/CONFIGURATION.md` for the full +RPM configuration reference. -## Secrets and Environment +## Network Model -The SSH handshake secret is created as a Podman secret and injected with the -libpod `secret_env` map. That keeps it out of `podman inspect`, although it is -still an environment variable visible to the supervisor process before the -supervisor scrubs it from child environments. +Sandbox network isolation uses a two-layer approach: a Podman bridge network +for container-to-host communication, and a nested network namespace created by +the supervisor for sandbox process isolation. -The container environment is built in priority order: +```mermaid +graph TB + subgraph Host + GW["Gateway Server
127.0.0.1:8080"] + PS["Podman Socket"] + end + + subgraph Bridge["Podman Bridge Network (10.89.x.x)"] + subgraph Container["Sandbox Container"] + SV["Supervisor
(root in user ns)"] + subgraph NestedNS["Nested Network Namespace"] + SP["Sandbox Process
(sandbox user)"] + VE2["veth1: 10.200.0.2"] + end + VE1["veth0: 10.200.0.1
(CONNECT proxy)"] + SV --- VE1 + VE1 ---|veth pair| VE2 + end + end + + GW -.->|SSH via supervisor relay
gRPC session| SV + SV -->|gRPC callback via
host.containers.internal| GW + SP -->|all egress via proxy| VE1 +``` -1. Sandbox spec and template environment. -2. Driver-controlled values that always overwrite user-supplied values. -3. TLS client paths when sandbox mTLS is enabled. +Key points: + +- Bridge network: created by `client.ensure_network()` with DNS enabled. + Containers on the bridge can see each other at L3, but sandbox processes + cannot because they are isolated inside the nested netns. +- Nested netns: the supervisor creates a private `NetworkNamespace` with a veth + pair. Sandbox processes enter this netns via `setns(fd, CLONE_NEWNET)` in the + `pre_exec` hook, forcing ordinary traffic through the CONNECT proxy. +- Port publishing: the container spec still requests `host_port: 0` for the + configured SSH port. The gateway SSH tunnel uses the supervisor relay rather + than connecting directly to the published port. +- Host gateway: `host.containers.internal:host-gateway` and + `host.openshell.internal:host-gateway` in `/etc/hosts` allow containers to + reach services on the gateway host. +- nsenter: the supervisor uses `nsenter --net=` instead of `ip netns exec` for + namespace operations, avoiding the sysfs remount path that fails in rootless + containers. + +See [NETWORKING.md](NETWORKING.md) for the rootless Podman networking deep dive. + +## Supervisor Relay + +Podman follows the same end-to-end contract as the Kubernetes and VM drivers +for the in-container SSH relay: gateway config to `PodmanComputeConfig` to +sandbox environment to supervisor session registration on that path. + +1. `openshell-core` `Config::sandbox_ssh_socket_path` is copied into + `PodmanComputeConfig::sandbox_ssh_socket_path` when the gateway builds the + in-process driver. +2. `build_env()` in `container.rs` sets `OPENSHELL_SSH_SOCKET_PATH` to that + value, alongside required vars such as `OPENSHELL_ENDPOINT` and + `OPENSHELL_SANDBOX_ID`. These driver-controlled entries overwrite template + environment variables to prevent spoofing. +3. The supervisor reads `OPENSHELL_SSH_SOCKET_PATH` and uses it for the Unix + socket the gateway's SSH stack bridges to. + +The standalone `openshell-driver-podman` binary sets the same struct field from +`OPENSHELL_SANDBOX_SSH_SOCKET_PATH`. + +## Credential Injection + +The SSH handshake secret is injected via Podman's `secret_env` API rather than +a plaintext environment variable. + +| Credential | Mechanism | Visible in `inspect`? | Visible in `/proc//environ`? | +|---|---|---|---| +| SSH handshake secret | Podman `secret_env`, created via secrets API and referenced by name | No | Yes, supervisor only, scrubbed from children | +| Sandbox identity | Plaintext env var | Yes | Yes | +| gRPC endpoint | Plaintext env var, override-protected | Yes | Yes | +| Supervisor relay socket path | Plaintext env var, override-protected | Yes | Yes | -Driver-controlled values include: +The `build_env()` function inserts user-supplied variables first, then +unconditionally overwrites all security-critical variables to prevent spoofing +via sandbox templates: - `OPENSHELL_SANDBOX` - `OPENSHELL_SANDBOX_ID` @@ -232,121 +246,137 @@ Driver-controlled values include: - `OPENSHELL_CONTAINER_IMAGE` - `OPENSHELL_SANDBOX_COMMAND` -Sandbox images and templates must not be allowed to spoof identity, callback, -relay, command metadata, or TLS path values. - -## TLS - -When all three Podman TLS paths are set, the driver treats sandbox callbacks as -mTLS callbacks: +The `PodmanComputeConfig::Debug` implementation redacts the handshake secret as +`[REDACTED]`. -- `OPENSHELL_PODMAN_TLS_CA` -- `OPENSHELL_PODMAN_TLS_CERT` -- `OPENSHELL_PODMAN_TLS_KEY` - -The driver validates that these paths are provided as a complete set. Partial -configuration fails early instead of silently falling back to plaintext. - -When enabled, the files are mounted read-only into the container at: - -- `/etc/openshell/tls/client/ca.crt` -- `/etc/openshell/tls/client/tls.crt` -- `/etc/openshell/tls/client/tls.key` +## Sandbox Lifecycle -The driver also sets `OPENSHELL_TLS_CA`, `OPENSHELL_TLS_CERT`, and -`OPENSHELL_TLS_KEY` to those container-side paths. On SELinux systems, the bind -mounts include Podman's shared relabel option so the container process can read -the files. +### Creation Flow -RPM installations generate a local PKI on first start and configure these paths -for the Podman driver. See `deploy/rpm/CONFIGURATION.md` for package-level -details. +```mermaid +sequenceDiagram + participant GW as Gateway + participant D as PodmanComputeDriver + participant P as Podman API -## Sandbox Lifecycle + GW->>D: create_sandbox(DriverSandbox) + D->>D: validate name + id + D->>D: validated_container_name() -Create follows this order: + D->>P: pull_image(supervisor, "missing") + D->>P: pull_image(sandbox_image, policy) -1. Validate the sandbox name and ID, then validate the derived Podman resource - names before creating anything. -2. Pull or verify the supervisor image with the `missing` policy. -3. Pull or verify the sandbox image with `OPENSHELL_SANDBOX_IMAGE_PULL_POLICY`. -4. Create the Podman secret for the SSH handshake secret. -5. Create the workspace volume. -6. Create the container from the generated spec. -7. Start the container. + D->>P: create_secret(handshake) + Note over D: On failure below, rollback secret -Failures roll back resources created earlier in the flow. A container name -conflict removes the new sandbox's workspace volume and handshake secret because -those resources are keyed by sandbox ID, not by the conflicting container. + D->>P: create_volume(workspace) + Note over D: On failure below, rollback volume + secret -Delete is idempotent: + D->>P: create_container(spec) + alt Conflict (409) + D->>P: remove_volume + remove_secret + D-->>GW: AlreadyExists + end + Note over D: On failure below, rollback container + volume + secret -1. Validate the sandbox ID and derived container name. -2. Best-effort inspect the container and warn if its sandbox ID label differs. -3. Stop the container using the configured timeout. -4. Force-remove the container and attached anonymous volumes. -5. Remove the workspace volume derived from the request sandbox ID. -6. Remove the handshake secret derived from the request sandbox ID. + D->>P: start_container + D-->>GW: Ok +``` -If the container is already gone, the driver still attempts volume and secret -cleanup and returns that no container existed. +Each step rolls back previously-created resources on failure. The Conflict path +cleans up the volume and secret because they are keyed by the new sandbox's ID, +not the conflicting container's ID. -## Readiness +### Readiness and Health -The container health check accepts any of these readiness signals: +The container `healthconfig` marks the sandbox healthy when any of these +signals succeeds: -- The legacy marker file `/var/run/openshell-ssh-ready` exists. -- The configured supervisor Unix socket path exists and is a socket. -- Something listens on the configured in-container SSH TCP port. +- Legacy marker file `/var/run/openshell-ssh-ready`. +- `test -S` on the configured supervisor Unix socket path. +- The prior TCP check for a listener on the in-container SSH port. -The Unix socket check is the preferred relay-only path. The TCP port mapping is -kept for compatibility with older readiness and debug flows. +The Unix socket check allows relay-only readiness when the supervisor exposes +the socket without the old marker or published-port signal. -## GPU Support +### Deletion Flow -The Podman driver reports GPU support when `/dev/nvidia0` exists on the gateway -host. If a sandbox requests GPU support and that device is missing, validation -fails before container creation. +1. Validate `sandbox_name` and stable `sandbox_id` from `DeleteSandboxRequest`. +2. Best-effort inspect cross-checks the container label when present, but + cleanup remains keyed by the request `sandbox_id`. +3. Best-effort stop, ignoring the stop result. +4. Force-remove the container. +5. Remove workspace volume derived from the request `sandbox_id`, warning on + failure and continuing. +6. Remove handshake secret derived from the request `sandbox_id`, warning on + failure and continuing. -When GPU support is requested, the container spec includes the CDI device -request `nvidia.com/gpu=all`. The host must have NVIDIA CDI specs available to -Podman, and the sandbox image must include user-space libraries required by the -workload. +If the container is already gone during inspect or remove, the driver still +performs idempotent volume and secret cleanup using the request `sandbox_id` and +returns `Ok(false)` for the container-delete result. This prevents leaked +Podman resources after out-of-band container removal or label drift. ## Configuration -The gateway configures the in-process driver from gateway settings and selected -environment variables. The standalone `openshell-driver-podman` binary exposes -the same fields as CLI flags and env vars. - -| Env var | Standalone flag | Default | Purpose | +| Environment Variable | CLI Flag | Default | Description | |---|---|---|---| -| `OPENSHELL_PODMAN_SOCKET` | `--podman-socket` | Platform default socket path | Podman API Unix socket. | -| `OPENSHELL_SANDBOX_IMAGE` | `--sandbox-image` | Gateway default sandbox image | Fallback OCI image for sandboxes that do not specify one. | -| `OPENSHELL_SANDBOX_IMAGE_PULL_POLICY` | `--sandbox-image-pull-policy` | `missing` | Pull policy for sandbox images: `always`, `missing`, `never`, or `newer`. | -| `OPENSHELL_GRPC_ENDPOINT` | `--grpc-endpoint` | Auto-detected `host.containers.internal` URL | Callback endpoint injected into sandboxes. | +| `OPENSHELL_PODMAN_SOCKET` | `--podman-socket` | `$XDG_RUNTIME_DIR/podman/podman.sock` on Linux, `$HOME/.local/share/containers/podman/machine/podman.sock` on macOS | Podman API Unix socket path. | +| `OPENSHELL_SANDBOX_IMAGE` | `--sandbox-image` | From gateway config | Default OCI image for sandboxes. | +| `OPENSHELL_SANDBOX_IMAGE_PULL_POLICY` | `--sandbox-image-pull-policy` | `missing` | Pull policy: `always`, `missing`, `never`, or `newer`. | +| `OPENSHELL_GRPC_ENDPOINT` | `--grpc-endpoint` | Auto-detected via `host.containers.internal` | Gateway gRPC endpoint for sandbox callbacks. | | `OPENSHELL_GATEWAY_PORT` | `--gateway-port` | `8080` | Gateway port used for endpoint auto-detection by the standalone binary. | | `OPENSHELL_NETWORK_NAME` | `--network-name` | `openshell` | Podman bridge network name. | -| `OPENSHELL_SANDBOX_SSH_PORT` | `--sandbox-ssh-port` | `2222` | In-container SSH compatibility port. | -| `OPENSHELL_SANDBOX_SSH_SOCKET_PATH` | `--sandbox-ssh-socket-path` | `/run/openshell/ssh.sock` | Supervisor Unix socket path for relay traffic. | -| `OPENSHELL_SSH_HANDSHAKE_SECRET` | `--ssh-handshake-secret` | Gateway-generated or required standalone | Shared secret for the NSSH1 handshake. | +| `OPENSHELL_SANDBOX_SSH_PORT` | `--sandbox-ssh-port` | `2222` | SSH compatibility port inside the container. | +| `OPENSHELL_SSH_HANDSHAKE_SECRET` | `--ssh-handshake-secret` | Required standalone, gateway-generated in-process | Shared secret for the NSSH1 handshake. | | `OPENSHELL_SSH_HANDSHAKE_SKEW_SECS` | `--ssh-handshake-skew-secs` | `300` | Allowed timestamp skew for SSH handshake validation. | +| `OPENSHELL_SANDBOX_SSH_SOCKET_PATH` | `--sandbox-ssh-socket-path` | `/run/openshell/ssh.sock` | Standalone driver only: supervisor Unix socket path in `PodmanComputeConfig`. In-gateway Podman uses server `config.sandbox_ssh_socket_path`. | | `OPENSHELL_STOP_TIMEOUT` | `--stop-timeout` | `10` | Container stop timeout in seconds. | -| `OPENSHELL_SUPERVISOR_IMAGE` | `--supervisor-image` | `openshell/supervisor:latest` through the gateway, required standalone | OCI image that supplies `openshell-sandbox`. | +| `OPENSHELL_SUPERVISOR_IMAGE` | `--supervisor-image` | `openshell/supervisor:latest` through the gateway, required standalone | OCI image containing the supervisor binary. | | `OPENSHELL_PODMAN_TLS_CA` | `--podman-tls-ca` | unset | Host path to the CA certificate mounted for sandbox mTLS. | | `OPENSHELL_PODMAN_TLS_CERT` | `--podman-tls-cert` | unset | Host path to the client certificate mounted for sandbox mTLS. | | `OPENSHELL_PODMAN_TLS_KEY` | `--podman-tls-key` | unset | Host path to the client private key mounted for sandbox mTLS. | -## Operational Notes - -- Prefer explicit `OPENSHELL_GRPC_ENDPOINT` only when the auto-detected - `host.containers.internal` endpoint is not appropriate for the deployment. -- Keep the gateway bound to an address that sandbox containers can reach. RPM - deployments bind on `0.0.0.0` and rely on mTLS for access control. -- Avoid relying on Podman bridge IPs from the host in rootless deployments. - Use `host.containers.internal`, `host.openshell.internal`, published ports, or - the supervisor relay. -- Rootless networking behavior depends on the backend reported by Podman. The - driver logs that backend at startup for troubleshooting. -- For sandbox infrastructure changes, run the Podman e2e path and update this - README when the operator-facing contract changes. +## Rootless-Specific Adaptations + +The Podman driver is designed for rootless operation. The following adaptations +matter compared to cluster or rootful runtimes: + +1. subuid/subgid preflight check: `check_subuid_range()` in `driver.rs` warns + operators if `/etc/subuid` or `/etc/subgid` entries are missing for the + current user. This is not a hard error because some systems use LDAP or + other mechanisms. +2. cgroups v2 requirement: the driver refuses to start if cgroups v1 is + detected. Rootless Podman requires the unified cgroup hierarchy. +3. `nsenter` for namespace operations: `openshell-sandbox` uses + `nsenter --net=` instead of `ip netns exec` to avoid the sysfs remount path + that requires real `CAP_SYS_ADMIN` in the host user namespace. +4. `DAC_READ_SEARCH` capability: required for the proxy to read + `/proc//fd/` across UIDs within the user namespace. +5. `SETUID` and `SETGID` capabilities: kept from Podman's default capability + set so `drop_privileges()` can call `setuid()` and `setgid()`. +6. `host.containers.internal`: used instead of Docker's `host.docker.internal` + for container-to-host communication. The driver also injects the + OpenShell-owned `host.openshell.internal` alias. +7. Ephemeral port publishing: the SSH compatibility port uses `host_port: 0` + because the bridge network IP is not reliably routable from the host in + rootless mode. +8. tmpfs at `/run/netns`: a private tmpfs lets the supervisor create named + network namespaces via `ip netns add`. + +## Implementation References + +- Gateway integration: `crates/openshell-server/src/compute/mod.rs` + (`new_podman` and `PodmanComputeDriver` wiring). +- Server configuration: `crates/openshell-server/src/lib.rs` + (`ComputeDriverKind::Podman` builds `PodmanComputeConfig` including + `sandbox_ssh_socket_path` from gateway `Config`). +- Gateway relay path: `openshell-core` `Config::sandbox_ssh_socket_path` in + `crates/openshell-core/src/config.rs`. +- SSRF mitigation: `crates/openshell-core/src/net.rs`, + `crates/openshell-sandbox/src/proxy.rs`, and + `crates/openshell-server/src/grpc/policy.rs`. +- Sandbox supervisor: `crates/openshell-sandbox/src/` for Landlock, seccomp, + netns, proxy, and relay behavior shared by all drivers. +- Container engine abstraction: `tasks/scripts/container-engine.sh` for + build/deploy support across Docker and Podman. +- Supervisor image build: `deploy/docker/Dockerfile.images`. diff --git a/docs/reference/sandbox-compute-drivers.mdx b/docs/reference/sandbox-compute-drivers.mdx index 40c0b8223..cc78b3b80 100644 --- a/docs/reference/sandbox-compute-drivers.mdx +++ b/docs/reference/sandbox-compute-drivers.mdx @@ -56,7 +56,7 @@ For GPU-backed Docker sandboxes, configure Docker CDI before starting the gatewa The gateway talks to the Podman API socket. The Podman driver requires Podman 5.x, cgroups v2, rootless networking, and an active Podman user socket. -For maintainer-level implementation details, refer to the [Podman driver README](https://github.com/NVIDIA/OpenShell/blob/main/crates/openshell-driver-podman/README.md). +For maintainer-level implementation details, refer to the [Podman driver README](https://github.com/NVIDIA/OpenShell/blob/main/crates/openshell-driver-podman/README.md) and [Podman networking notes](https://github.com/NVIDIA/OpenShell/blob/main/crates/openshell-driver-podman/NETWORKING.md). | Option | Environment variable | Description | |---|---|---| From c2402d72fc8334057923a519f1b7ad5ad4690543 Mon Sep 17 00:00:00 2001 From: Drew Newberry Date: Thu, 7 May 2026 13:17:13 -0700 Subject: [PATCH 3/3] cleanup --- crates/openshell-driver-podman/NETWORKING.md | 18 ------------- crates/openshell-driver-podman/README.md | 28 -------------------- 2 files changed, 46 deletions(-) diff --git a/crates/openshell-driver-podman/NETWORKING.md b/crates/openshell-driver-podman/NETWORKING.md index 1767927db..87eef079c 100644 --- a/crates/openshell-driver-podman/NETWORKING.md +++ b/crates/openshell-driver-podman/NETWORKING.md @@ -416,21 +416,3 @@ published ports, or the supervisor relay. | `2222` | Sandbox | Container port mapping default for the SSH compatibility port. | | `3128` | Sandbox proxy | HTTP CONNECT proxy inside the sandbox network model. | | `0` | Host | Ephemeral host port requested for the container SSH compatibility port. | - -## Key Source Files - -| File | What it controls | -|---|---| -| `crates/openshell-driver-podman/src/driver.rs` | Bridge network creation, gRPC endpoint auto-detection, rootless checks. | -| `crates/openshell-driver-podman/src/container.rs` | Container spec: network mode, port mappings, host aliases, tmpfs, capabilities. | -| `crates/openshell-driver-podman/src/client.rs` | Podman REST API calls for network ensure/inspect, port discovery, and events. | -| `crates/openshell-driver-podman/src/config.rs` | Network name, socket path, SSH port, gateway port defaults. | -| `crates/openshell-sandbox/src/sandbox/linux/netns.rs` | Inner network namespace, veth pair, IP addressing, iptables rules. | -| `crates/openshell-sandbox/src/proxy.rs` | HTTP CONNECT proxy, OPA policy, SSRF protection, L7 inspection. | -| `crates/openshell-sandbox/src/ssh.rs` | SSH daemon on Unix socket and shell process netns entry via `setns()`. | -| `crates/openshell-sandbox/src/supervisor_session.rs` | gRPC `ConnectSupervisor` stream and `RelayStream` for SSH tunneling. | -| `crates/openshell-sandbox/src/grpc_client.rs` | gRPC channel to gateway with mTLS or plaintext, keep-alive, and reconnect behavior. | -| `crates/openshell-server/src/ssh_tunnel.rs` | Gateway-side SSH tunnel, HTTP CONNECT endpoint, relay bridging. | -| `crates/openshell-server/src/supervisor_session.rs` | `SupervisorSessionRegistry`, relay claim/open lifecycle. | -| `crates/openshell-server/src/compute/mod.rs` | `ComputeRuntime::new_podman()` driver initialization. | -| `crates/openshell-core/src/config.rs` | Default constants for ports and network names. | diff --git a/crates/openshell-driver-podman/README.md b/crates/openshell-driver-podman/README.md index a2c65feed..d853bb5ea 100644 --- a/crates/openshell-driver-podman/README.md +++ b/crates/openshell-driver-podman/README.md @@ -9,21 +9,6 @@ sideloaded into each container via an OCI image volume mount. For a rootless networking deep dive, see [NETWORKING.md](NETWORKING.md). -## Source File Index - -All paths are relative to `crates/openshell-driver-podman/src/`. - -| File | Purpose | -|---|---| -| `lib.rs` | Crate root and public re-exports. | -| `main.rs` | Standalone binary entrypoint, CLI/env parsing, driver construction, and gRPC server startup. | -| `driver.rs` | Core `PodmanComputeDriver`: sandbox lifecycle, image pulls, endpoint detection, GPU checks, rootless preflight checks, and bridge network setup. | -| `client.rs` | `PodmanClient`: async HTTP/1.1 client over a Unix socket for the Podman libpod REST API. | -| `container.rs` | Container spec construction: labels, environment, resource limits, capabilities, seccomp config, health checks, port mappings, image volumes, TLS mounts, and secret injection. | -| `config.rs` | `PodmanComputeConfig`, `ImagePullPolicy`, default socket path resolution, TLS validation, and redacted `Debug` output. | -| `grpc.rs` | `ComputeDriverService`: tonic gRPC service mapping compute driver RPCs to driver methods. | -| `watcher.rs` | Watch stream: initial state sync via container list, then live Podman events mapped to `WatchSandboxesEvent` protobuf messages. | - ## Architecture The Podman driver communicates with the Podman daemon over a Unix socket and @@ -42,19 +27,6 @@ graph TB SV -->|gRPC callback| GW ``` -### Driver Comparison - -| Aspect | Kubernetes | Docker | VM | Podman | -|---|---|---|---|---| -| Execution model | In-process | In-process | Standalone subprocess over gRPC UDS | In-process, with standalone binary support | -| Backend | Kubernetes API | Docker daemon | libkrun and gvproxy | Podman REST API over Unix socket | -| Isolation boundary | Pod plus nested sandbox namespace | Container plus nested sandbox namespace | Per-sandbox microVM | Container plus nested sandbox namespace | -| Supervisor delivery | Supervisor image or init copy into pod volume | Extracted or mounted supervisor binary | Embedded guest bundle | OCI image volume, read-only | -| Network model | Supervisor creates netns inside pod | Host networking plus nested netns | gvproxy virtio-net | Podman bridge plus nested netns | -| Credential injection | Environment and Kubernetes Secret volume | Environment and mounted TLS bundle | Guest rootfs copy and environment | Podman `secret_env`, environment, and mounted TLS bundle | -| GPU support | `nvidia.com/gpu` resource | CDI when daemon supports it | Experimental VFIO path | CDI device request when NVIDIA devices exist | -| State storage | Kubernetes API | Docker daemon | Driver state dir | Podman daemon | - ## Isolation Model The Podman driver provides the same protection layers as the other compute