Skip to content

feat(noderesource): set SELinuxChangePolicy=MountOption on SeiNode pods#202

Merged
bdchatham merged 2 commits intomainfrom
chore/pod-selinux-change-policy
May 7, 2026
Merged

feat(noderesource): set SELinuxChangePolicy=MountOption on SeiNode pods#202
bdchatham merged 2 commits intomainfrom
chore/pod-selinux-change-policy

Conversation

@bdchatham
Copy link
Copy Markdown
Collaborator

Summary

Set pod.spec.securityContext.SELinuxChangePolicy = MountOption on every pod the SeiNode controller builds.

Without this, on SELinux-enforcing nodes (Bottlerocket on EKS), the kubelet does a recursive setxattr walk over the entire data PVC on pod start to apply the per-pod MCS label. On the pacific-1 archive's 40 TiB / 8.2M-file xfs volume that walk took ~20 minutes — pod stuck in Init:0/2 PodInitializing, with CreateContainerError "name reservation" symptoms downstream.

With MountOption, the kernel applies the SELinux context as a per-mount overlay in milliseconds. Feature is GA in K8s 1.33+ (SELinuxMount feature gate).

Test plan

  • go test ./internal/noderesource/... passes
  • CI lint + test green
  • Verified on next archive pod recreate (boot time should drop from ~22 min to <1 min for the volume-relabel phase)

bdchatham and others added 2 commits May 7, 2026 11:27
Tells kubelet/CSI to apply the pod's SELinux context as a per-mount
overlay (one mount syscall option) instead of the default behavior
of recursively rewriting xattrs on every file in the data PVC.

Concrete impact for archive nodes (40 TiB xfs, 8.2M files):
  - Before: every pod creation triggers a full setxattr walk →
    ~20 minutes of CreateContainer hang while runc relabels each
    file to match the pod's randomized MCS pair.
  - After:  the kernel applies the context at mount time, regardless
    of filesystem size → milliseconds. Subsequent pod recreations
    (different MCS) just remount with a different context= option,
    same instant cost.

Mechanism: SELinux mount option `context=<label>` makes every file
under a mount appear to have that label to the kernel's SELinux
subsystem, regardless of on-disk xattrs. No per-file work, no disk
writes — a per-mount-instance overlay.

Requires K8s 1.33+ (SELinuxMount feature GA) and a CSI driver that
supports the SELinuxMount capability; EBS CSI v1.30+ does. Cluster
runs 1.34, EBS CSI is current — both prerequisites met.

Validated on pacific-1-archive-0 in prod: same 40 TiB volume
consistently triggered the 20-minute relabel walk on every pod
recreation. With this change, future recreations skip that path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bdchatham bdchatham merged commit 0ba241e into main May 7, 2026
2 checks passed
@bdchatham bdchatham deleted the chore/pod-selinux-change-policy branch May 7, 2026 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant