TL;DR
Where is the kubelet config file on Rancher 2.6.9 - RKE1, like this Set Kubelet parameters via a config file | Kubernetes
Can I manage it? Does this file exist?
Didn’t find it in /var/lib/kubelet
# pwd
/var/lib/kubelet
# ls -lha
total 16K
drwxr-xr-x 9 root root 185 Sep 5 13:20 .
drwxr-xr-x. 42 root root 4.0K Sep 22 15:50 ..
-rw------- 1 root root 62 Sep 5 13:20 cpu_manager_state
drwxr-xr-x 2 root root 45 Nov 1 11:27 device-plugins
-rw------- 1 root root 61 Sep 5 13:20 memory_manager_state
drwxr-xr-x 2 root root 44 Sep 5 13:20 pki
drwxr-x--- 2 root root 6 Sep 5 13:20 plugins
drwxr-x--- 2 root root 6 Sep 5 13:20 plugins_registry
drwxr-x--- 2 root root 26 Nov 1 11:27 pod-resources
drwxr-x--- 11 root root 4.0K Oct 24 23:57 pods
drwxr-xr-x 2 root root 6 Sep 5 13:20 volumeplugins
Explain
Recently we’ve upgraded the Kubernetes version to v1.24.4-rancher1-1 and to Rancher 2.6.9. Everything worked fine, but recently we’ve noticed a new behavior: If a image is to big or takes more than 2 minutes to accomplish the download, the Kubernetes raise an ErrImagePull.
To bypass this error, I need to login to the cluster, do a docker pull <image> to stop this error.
Error: ImagePullBackOff
~ ❯ kubectl get pods -n mobile test-imagepullback-c7fc59d86-gwtc7
NAME READY STATUS RESTARTS AGE
test-imagepullback-c7fc59d86-gwtc7 0/1 ContainerCreating 0 2m
~ ❯ kubectl get pods -n mobile test-imagepullback-c7fc59d86-gwtc7
NAME READY STATUS RESTARTS AGE
test-imagepullback-c7fc59d86-gwtc7 0/1 ErrImagePull 0 2m1s
~ ❯ kubectl get pods -n mobile test-imagepullback-c7fc59d86-gwtc7
NAME READY STATUS RESTARTS AGE
test-imagepullback-c7fc59d86-gwtc7 0/1 ImagePullBackOff 0 2m12s
Searching for the problem, we discovered that the error is caused by a timeout in kubelet’s request (2 minutes, accourding to the doc kubelet | Kubernetes), which could be raised with a flag –runtime-request-timeout duration. Changing the cluster.yaml with the below parameters, nothing happens:
[...]
kubelet:
extra_args:
runtime-request-timeout: 10m
fail_swap_on: false
[...]
The process running, showing that the parameter reflects to kubelet configuration
# ps -ef | grep runtime-request-timeout
root 7286 7267 0 Nov01 ? 00:00:00 /bin/bash /opt/rke-tools/entrypoint.sh kubelet {...} --runtime-request-timeout=10m {...}
In the official page, this parameter is deprecated, which explains this behavior, and to change it I need to alter a parameter named runtimeRequestTimeout inside a config-file.
So I have some doubts:
- Where I change it?
- This file exist in Rancher or I need to create it?
- Is there a way to bypass with another parameter in extra_args?
- Why this is happening now? Is because the deprecation of dockershim?
Docker and kubernetes version
# docker version
Client: Docker Engine - Community
Version: 20.10.12
API version: 1.41
Go version: go1.16.12
Git commit: e91ed57
Built: Mon Dec 13 11:45:41 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.21
API version: 1.41 (minimum version 1.12)
Go version: go1.18.7
Git commit: 3056208
Built: Tue Oct 25 18:02:38 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.9
GitCommit: 1c90a442489720eec95342e1789ee8a5e1b9536f
runc:
Version: 1.1.4
GitCommit: v1.1.4-0-g5fd4c4d
docker-init:
Version: 0.19.0
GitCommit: de40ad0
# kubectl version --short
Client Version: v1.25.0
Kustomize Version: v4.5.7
Server Version: v1.24.4
I would be grateful if this help me and others to solve this annoying issue.