feat(noderesource): set SELinuxChangePolicy=MountOption on SeiNode pods#202
Merged
feat(noderesource): set SELinuxChangePolicy=MountOption on SeiNode pods#202
Conversation
Tells kubelet/CSI to apply the pod's SELinux context as a per-mount
overlay (one mount syscall option) instead of the default behavior
of recursively rewriting xattrs on every file in the data PVC.
Concrete impact for archive nodes (40 TiB xfs, 8.2M files):
- Before: every pod creation triggers a full setxattr walk →
~20 minutes of CreateContainer hang while runc relabels each
file to match the pod's randomized MCS pair.
- After: the kernel applies the context at mount time, regardless
of filesystem size → milliseconds. Subsequent pod recreations
(different MCS) just remount with a different context= option,
same instant cost.
Mechanism: SELinux mount option `context=<label>` makes every file
under a mount appear to have that label to the kernel's SELinux
subsystem, regardless of on-disk xattrs. No per-file work, no disk
writes — a per-mount-instance overlay.
Requires K8s 1.33+ (SELinuxMount feature GA) and a CSI driver that
supports the SELinuxMount capability; EBS CSI v1.30+ does. Cluster
runs 1.34, EBS CSI is current — both prerequisites met.
Validated on pacific-1-archive-0 in prod: same 40 TiB volume
consistently triggered the 20-minute relabel walk on every pod
recreation. With this change, future recreations skip that path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Set
pod.spec.securityContext.SELinuxChangePolicy = MountOptionon every pod the SeiNode controller builds.Without this, on SELinux-enforcing nodes (Bottlerocket on EKS), the kubelet does a recursive
setxattrwalk over the entire data PVC on pod start to apply the per-pod MCS label. On the pacific-1 archive's 40 TiB / 8.2M-file xfs volume that walk took ~20 minutes — pod stuck inInit:0/2 PodInitializing, withCreateContainerError "name reservation"symptoms downstream.With
MountOption, the kernel applies the SELinux context as a per-mount overlay in milliseconds. Feature is GA in K8s 1.33+ (SELinuxMountfeature gate).Test plan
go test ./internal/noderesource/...passes