Hey guys! So I’m experiencing some issues with a Rancher v2.5.9 instance deployed with Docker container.
“local” cluster appears to be broken someway.
A brief explanation:
- local-node is reported as “Active” in the Node view in Rancher UI
- pods have a Unkown/ContainerCreating status:
root@41c1d38c59fe:~# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
cattle-system rancher-webhook-6cccfd96b5-n97z4 0/1 Unknown 208 27d
cattle-system rancher-webhook-7578b94d75-zvsbk 0/1 ContainerCreating 0 42m
fleet-system fleet-agent-d59db746-r9wsv 0/1 Unknown 21 27d
fleet-system fleet-controller-79554fcbf5-g56fm 0/1 Unknown 41 27d
fleet-system gitjob-568c57cfb9-v85gk 0/1 Unknown 41 27d
kube-system coredns-5fd759fd8c-fn9v6 0/1 ContainerCreating 0 3d17h
rancher-operator-system rancher-operator-595ddc6db9-swbg4 0/1 Unknown 20 27d
- describe-pods can report that the container is actually running:
root@41c1d38c59fe:~# kubectl describe pod -n cattle-system rancher-webhook-6cccfd96b5-n97z4
Name: rancher-webhook-6cccfd96b5-n97z4
Namespace: cattle-system
Priority: 0
Node: local-node/172.27.0.2
Start Time: Tue, 10 Aug 2021 08:44:16 +0000
Labels: app=rancher-webhook
pod-template-hash=6cccfd96b5
Annotations: <none>
Status: Running
IP:
IPs: <none>
Controlled By: ReplicaSet/rancher-webhook-6cccfd96b5
Containers:
rancher-webhook:
Container ID: containerd://8e971f03b661ba53a137e4eeb4e97feb6c9fbcf2af65b13b306f3c0893323f0d
Image: rancher/rancher-webhook:v0.1.1
Image ID: docker.io/rancher/rancher-webhook@sha256:fbde663643e91d99207234ad3f64db1fcfa0c1d3b4c88d41270c7d1735cef471
Port: 9443/TCP
Host Port: 0/TCP
State: Terminated
Reason: Unknown
Exit Code: 255
Started: Sun, 22 Aug 2021 05:22:33 +0000
Finished: Sun, 22 Aug 2021 05:24:32 +0000
Ready: False
Restart Count: 208
Environment:
NAMESPACE: cattle-system (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from rancher-webhook-token-zr9sk (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
rancher-webhook-token-zr9sk:
Type: Secret (a volume populated by a Secret)
SecretName: rancher-webhook-token-zr9sk
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 31s (x78052 over 11d) kubelet Pod sandbox changed, it will be killed and re-created.
Then when I tried to take a look at containerd logs, here’s what I found:
time="2021-09-06T09:27:55.481351424Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:gitjob-568c57cfb9-v85gk,Uid:15cf6793-a4ed-4dde-9e77-33998bd8205c,Namespace:fleet-system,Attempt:221,} failed, error" error="failed to setup network for sandbox \"7e08af4eb616ea15d54ed2b5266ef818de0c4ec9842444267aff507bae47ecee\": failed to allocate for range 0: no IP addresses available in range set: 10.42.0.1-10.42.0.254"
time="2021-09-06T09:27:55.509505488Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:rancher-webhook-6cccfd96b5-n97z4,Uid:b566b2b1-ccb0-4d5a-b1dd-2a1a75d780ee,Namespace:cattle-system,Attempt:300,} failed, error" error="failed to setup network for sandbox \"4935c42e70f227d8988b9f887de19abd7fe5b70244e349047d16fc1b8afaf05d\": failed to allocate for range 0: no IP addresses available in range set: 10.42.0.1-10.42.0.254"
time="2021-09-06T09:27:58.262757955Z" level=info msg="StopPodSandbox for \"c7e8c490b3dcdd5111e2c0a10f5751d8efecfe8f1644d516a2fe25a0f5fddca6\""
time="2021-09-06T09:27:58.262827930Z" level=info msg="Container to stop \"1437823859871b9acd8449592f54359b5d0facc7b23de2550f75ccf8630864c2\" must be in running or unknown state, current state \"CONTAINER_EXITED\""
time="2021-09-06T09:27:58.287290893Z" level=info msg="TearDown network for sandbox \"c7e8c490b3dcdd5111e2c0a10f5751d8efecfe8f1644d516a2fe25a0f5fddca6\" successfully"
time="2021-09-06T09:27:58.287333268Z" level=info msg="StopPodSandbox for \"c7e8c490b3dcdd5111e2c0a10f5751d8efecfe8f1644d516a2fe25a0f5fddca6\" returns successfully"
time="2021-09-06T09:27:58.288072068Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:rancher-operator-595ddc6db9-swbg4,Uid:c8b83419-30b4-4b8b-9838-0a417af60edc,Namespace:rancher-operator-system,Attempt:298,}"
time="2021-09-06T09:27:58.461892572Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:rancher-operator-595ddc6db9-swbg4,Uid:c8b83419-30b4-4b8b-9838-0a417af60edc,Namespace:rancher-operator-system,Attempt:298,} failed, error" error="failed to setup network for sandbox \"4e3f2e1f0f5bdc9970d0bd39399e34f7ce238a6840d5792fbfe69224b3d9f94d\": failed to allocate for range 0: no IP addresses available in range set: 10.42.0.1-10.42.0.254"
time="2021-09-06T09:28:01.262938313Z" level=info msg="StopPodSandbox for \"17a1580e5b926788f4dce483cb8b557d5be3f9cb11a1f182a9f0406af7064fd1\""
time="2021-09-06T09:28:01.263054544Z" level=info msg="Container to stop \"fe05e697e0ea6b189616e9d17f69efa1669fd41abd8dc08b772273ecfc88d8b3\" must be in running or unknown state, current state \"CONTAINER_EXITED\""
time="2021-09-06T09:28:01.288111989Z" level=info msg="TearDown network for sandbox \"17a1580e5b926788f4dce483cb8b557d5be3f9cb11a1f182a9f0406af7064fd1\" successfully"
time="2021-09-06T09:28:01.288159164Z" level=info msg="StopPodSandbox for \"17a1580e5b926788f4dce483cb8b557d5be3f9cb11a1f182a9f0406af7064fd1\" returns successfully"
time="2021-09-06T09:28:01.288809966Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:fleet-controller-79554fcbf5-g56fm,Uid:f2d982cf-1834-49f1-8841-b83a1ce0e2df,Namespace:fleet-system,Attempt:56,}"
time="2021-09-06T09:28:01.458469927Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:fleet-controller-79554fcbf5-g56fm,Uid:f2d982cf-1834-49f1-8841-b83a1ce0e2df,Namespace:fleet-system,Attempt:56,} failed, error" error="failed to setup network for sandbox \"bbe2454b49b34efa0c2255295918596dbd20945b56cbdaf4709180a829072e24\": failed to allocate for range 0: no IP addresses available in range set: 10.42.0.1-10.42.0.254"
So seems that IPs cannot be allocated but there’s absolutely not enough container to fill up the subnet. Now I wonder if I can safely change it since it’s a single docker instance, and it is not installed in HA on a Kubernetes cluster
Running k3s check-config
I found this:
Does anybody ever experienced this and knows what needs to be done?