From 48d2ecb2adf385cca72a3c41613f0532c26b5e53 Mon Sep 17 00:00:00 2001 From: Komh Date: Sat, 2 May 2026 16:47:43 +0000 Subject: [PATCH] [configure] Tuning kubelet image garbage collection to avoid frequent image re-pulls --- ...ection_to_avoid_frequent_image_re_pulls.md | 125 ++++++++++++++++++ 1 file changed, 125 insertions(+) create mode 100644 docs/en/solutions/Tuning_kubelet_image_garbage_collection_to_avoid_frequent_image_re_pulls.md diff --git a/docs/en/solutions/Tuning_kubelet_image_garbage_collection_to_avoid_frequent_image_re_pulls.md b/docs/en/solutions/Tuning_kubelet_image_garbage_collection_to_avoid_frequent_image_re_pulls.md new file mode 100644 index 00000000..fe3644c3 --- /dev/null +++ b/docs/en/solutions/Tuning_kubelet_image_garbage_collection_to_avoid_frequent_image_re_pulls.md @@ -0,0 +1,125 @@ +--- +kind: + - Information +products: + - Alauda Container Platform +ProductsVersion: + - 4.1.0,4.2.x +--- + +# Tuning kubelet image garbage collection to avoid frequent image re-pulls +## Issue + +The kubelet is removing container images that are still in active use on the node, which forces the runtime to re-pull them the next time a pod that references them is scheduled. The symptoms are: + +- Frequent `Pulling image` events for images that have been pulled within the last few hours. +- A noticeable spike in registry-egress network traffic, sometimes large enough to saturate the cluster's pull-through cache or trip rate-limit defenses on an external registry. +- The node's `/var/lib/containers/storage/` (or `/var/lib/containerd/`) usage hovers near a tight ceiling and oscillates as the kubelet's image GC kicks in repeatedly. + +## Root Cause + +The kubelet has two thresholds that drive image garbage collection: `imageGCHighThresholdPercent` and `imageGCLowThresholdPercent`. They define the percentage of the imagefs that the kubelet considers "full enough to act" and "low enough to stop", respectively. When the high threshold is set too low for the workload's image churn — typical defaults are 85 / 80 — the kubelet enters and exits a GC cycle frequently, evicting images that were perfectly hot. + +Two situations make the default values especially mismatched: + +- The node has a relatively small dedicated image partition, so even a moderate working set crosses 85% quickly. +- The workload references a large catalog of images (multi-tenant clusters, build farms, AI model containers), so the **set of images that should stay** is far larger than what fits comfortably under the high mark. + +In either case, the answer is to raise the high threshold (and lower the low threshold proportionally) so that the kubelet only collects when the disk is genuinely under pressure. + +## Resolution + +The two thresholds live on each kubelet's running configuration. There are three places they can be set; pick the one the cluster's node lifecycle is built around. + +### Per-node kubelet configuration file + +On a self-managed node where the kubelet reads its configuration from a file (typically `/var/lib/kubelet/config.yaml` or `/etc/kubernetes/kubelet-config.yaml`), edit the file and restart the kubelet: + +```yaml +# /var/lib/kubelet/config.yaml +apiVersion: kubelet.config.k8s.io/v1beta1 +kind: KubeletConfiguration +imageGCHighThresholdPercent: 90 +imageGCLowThresholdPercent: 75 +# ...other fields unchanged +``` + +Apply with: + +```bash +systemctl restart kubelet +``` + +Repeat on every node whose imagefs sees the same pressure. Drain the node first if production traffic is on it. + +### Cluster-managed kubelet configuration + +On a cluster whose nodes are managed declaratively (the cluster operator owns `/var/lib/kubelet/config.yaml`), the same fields are exposed through the platform's node-configuration custom resource. The shape varies, but the keys are the same: + +```yaml +# Example: a node-config CR scoped to a node pool +spec: + kubeletConfiguration: + imageGCHighThresholdPercent: 90 + imageGCLowThresholdPercent: 75 +``` + +Applying the CR triggers the operator to re-render `/var/lib/kubelet/config.yaml` on each affected node and roll the kubelet — this is a node-by-node restart, so plan a maintenance window if the rollout would otherwise contend with workload SLAs. + +### Verifying the live values + +After restarting the kubelet, confirm the running config picked up the new thresholds. From any node, the kubelet's read-only port (or the platform's node-debug shell) exposes the merged configuration: + +```bash +kubectl debug node/ -it --profile=sysadmin --image= -- \ + curl -s http://localhost:10248/configz | jq '.kubeletconfig | { high: .imageGCHighThresholdPercent, low: .imageGCLowThresholdPercent }' +``` + +If the read-only port is closed, read the file directly: + +```bash +kubectl debug node/ -it --profile=sysadmin --image= -- \ + grep -E 'imageGC(High|Low)ThresholdPercent' /var/lib/kubelet/config.yaml +``` + +### Choosing values + +There is no universal correct pair. A starting heuristic: + +| Disk pressure profile | high | low | +|---|---|---| +| Plenty of imagefs headroom | 90 | 75 | +| Tight imagefs, large image catalog | 85 | 70 | +| Very tight imagefs, must keep images | 95 | 85 | + +Then watch over a week: + +- If the kubelet runs GC more than once per hour, raise both thresholds. +- If `imagefs` ever crosses 95% without GC running, lower both thresholds. + +## Diagnostic Steps + +1. Confirm the node is the bottleneck and not the registry. From a control-plane host: + + ```bash + kubectl get events -A --field-selector reason=Pulling \ + -o jsonpath='{range .items[*]}{.lastTimestamp}{" "}{.involvedObject.namespace}/{.involvedObject.name}{" "}{.message}{"\n"}{end}' \ + | sort | tail -50 + ``` + + Repeated pulls for the same `/` — especially within the kubelet's GC interval — indicate the node, not the workload. + +2. Capture the kubelet's GC log lines on the affected node. The kubelet logs every eviction at `info`: + + ```bash + journalctl -u kubelet --since "1 hour ago" | grep -E 'image_gc|imageGC|Removing image' + ``` + +3. Check the imagefs occupancy at the moment GC fires. The `imageFs` stats are reported through the kubelet's stats endpoint: + + ```bash + kubectl get --raw /api/v1/nodes//proxy/stats/summary \ + | jq '.node.fs, .node.runtime.imageFs' + ``` + +4. If the eviction list looks correct (large rarely-used images going first) but pulls still feel excessive, inspect the workload — a Deployment whose pods restart every few minutes will keep refetching its image regardless of the GC settings.