Broken local-cluster in Rancher v2.5.9

Hey guys! So I’m experiencing some issues with a Rancher v2.5.9 instance deployed with Docker container.

“local” cluster appears to be broken someway.
A brief explanation:

  • local-node is reported as “Active” in the Node view in Rancher UI
  • pods have a Unkown/ContainerCreating status:
root@41c1d38c59fe:~# kubectl get pods -A
NAMESPACE                 NAME                                READY   STATUS              RESTARTS   AGE
cattle-system             rancher-webhook-6cccfd96b5-n97z4    0/1     Unknown             208        27d
cattle-system             rancher-webhook-7578b94d75-zvsbk    0/1     ContainerCreating   0          42m
fleet-system              fleet-agent-d59db746-r9wsv          0/1     Unknown             21         27d
fleet-system              fleet-controller-79554fcbf5-g56fm   0/1     Unknown             41         27d
fleet-system              gitjob-568c57cfb9-v85gk             0/1     Unknown             41         27d
kube-system               coredns-5fd759fd8c-fn9v6            0/1     ContainerCreating   0          3d17h
rancher-operator-system   rancher-operator-595ddc6db9-swbg4   0/1     Unknown             20         27d
  • describe-pods can report that the container is actually running:
root@41c1d38c59fe:~#  kubectl describe pod -n cattle-system rancher-webhook-6cccfd96b5-n97z4
Name:           rancher-webhook-6cccfd96b5-n97z4
Namespace:      cattle-system
Priority:       0
Node:           local-node/172.27.0.2
Start Time:     Tue, 10 Aug 2021 08:44:16 +0000
Labels:         app=rancher-webhook
                pod-template-hash=6cccfd96b5
Annotations:    <none>
Status:         Running
IP:
IPs:            <none>
Controlled By:  ReplicaSet/rancher-webhook-6cccfd96b5
Containers:
  rancher-webhook:
    Container ID:   containerd://8e971f03b661ba53a137e4eeb4e97feb6c9fbcf2af65b13b306f3c0893323f0d
    Image:          rancher/rancher-webhook:v0.1.1
    Image ID:       docker.io/rancher/rancher-webhook@sha256:fbde663643e91d99207234ad3f64db1fcfa0c1d3b4c88d41270c7d1735cef471
    Port:           9443/TCP
    Host Port:      0/TCP
    State:          Terminated
      Reason:       Unknown
      Exit Code:    255
      Started:      Sun, 22 Aug 2021 05:22:33 +0000
      Finished:     Sun, 22 Aug 2021 05:24:32 +0000
    Ready:          False
    Restart Count:  208
    Environment:
      NAMESPACE:  cattle-system (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from rancher-webhook-token-zr9sk (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  rancher-webhook-token-zr9sk:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  rancher-webhook-token-zr9sk
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason          Age                    From     Message
  ----    ------          ----                   ----     -------
  Normal  SandboxChanged  31s (x78052 over 11d)  kubelet  Pod sandbox changed, it will be killed and re-created.

Then when I tried to take a look at containerd logs, here’s what I found:

time="2021-09-06T09:27:55.481351424Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:gitjob-568c57cfb9-v85gk,Uid:15cf6793-a4ed-4dde-9e77-33998bd8205c,Namespace:fleet-system,Attempt:221,} failed, error" error="failed to setup network for sandbox \"7e08af4eb616ea15d54ed2b5266ef818de0c4ec9842444267aff507bae47ecee\": failed to allocate for range 0: no IP addresses available in range set: 10.42.0.1-10.42.0.254"
time="2021-09-06T09:27:55.509505488Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:rancher-webhook-6cccfd96b5-n97z4,Uid:b566b2b1-ccb0-4d5a-b1dd-2a1a75d780ee,Namespace:cattle-system,Attempt:300,} failed, error" error="failed to setup network for sandbox \"4935c42e70f227d8988b9f887de19abd7fe5b70244e349047d16fc1b8afaf05d\": failed to allocate for range 0: no IP addresses available in range set: 10.42.0.1-10.42.0.254"
time="2021-09-06T09:27:58.262757955Z" level=info msg="StopPodSandbox for \"c7e8c490b3dcdd5111e2c0a10f5751d8efecfe8f1644d516a2fe25a0f5fddca6\""
time="2021-09-06T09:27:58.262827930Z" level=info msg="Container to stop \"1437823859871b9acd8449592f54359b5d0facc7b23de2550f75ccf8630864c2\" must be in running or unknown state, current state \"CONTAINER_EXITED\""
time="2021-09-06T09:27:58.287290893Z" level=info msg="TearDown network for sandbox \"c7e8c490b3dcdd5111e2c0a10f5751d8efecfe8f1644d516a2fe25a0f5fddca6\" successfully"
time="2021-09-06T09:27:58.287333268Z" level=info msg="StopPodSandbox for \"c7e8c490b3dcdd5111e2c0a10f5751d8efecfe8f1644d516a2fe25a0f5fddca6\" returns successfully"
time="2021-09-06T09:27:58.288072068Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:rancher-operator-595ddc6db9-swbg4,Uid:c8b83419-30b4-4b8b-9838-0a417af60edc,Namespace:rancher-operator-system,Attempt:298,}"
time="2021-09-06T09:27:58.461892572Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:rancher-operator-595ddc6db9-swbg4,Uid:c8b83419-30b4-4b8b-9838-0a417af60edc,Namespace:rancher-operator-system,Attempt:298,} failed, error" error="failed to setup network for sandbox \"4e3f2e1f0f5bdc9970d0bd39399e34f7ce238a6840d5792fbfe69224b3d9f94d\": failed to allocate for range 0: no IP addresses available in range set: 10.42.0.1-10.42.0.254"
time="2021-09-06T09:28:01.262938313Z" level=info msg="StopPodSandbox for \"17a1580e5b926788f4dce483cb8b557d5be3f9cb11a1f182a9f0406af7064fd1\""
time="2021-09-06T09:28:01.263054544Z" level=info msg="Container to stop \"fe05e697e0ea6b189616e9d17f69efa1669fd41abd8dc08b772273ecfc88d8b3\" must be in running or unknown state, current state \"CONTAINER_EXITED\""
time="2021-09-06T09:28:01.288111989Z" level=info msg="TearDown network for sandbox \"17a1580e5b926788f4dce483cb8b557d5be3f9cb11a1f182a9f0406af7064fd1\" successfully"
time="2021-09-06T09:28:01.288159164Z" level=info msg="StopPodSandbox for \"17a1580e5b926788f4dce483cb8b557d5be3f9cb11a1f182a9f0406af7064fd1\" returns successfully"
time="2021-09-06T09:28:01.288809966Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:fleet-controller-79554fcbf5-g56fm,Uid:f2d982cf-1834-49f1-8841-b83a1ce0e2df,Namespace:fleet-system,Attempt:56,}"
time="2021-09-06T09:28:01.458469927Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:fleet-controller-79554fcbf5-g56fm,Uid:f2d982cf-1834-49f1-8841-b83a1ce0e2df,Namespace:fleet-system,Attempt:56,} failed, error" error="failed to setup network for sandbox \"bbe2454b49b34efa0c2255295918596dbd20945b56cbdaf4709180a829072e24\": failed to allocate for range 0: no IP addresses available in range set: 10.42.0.1-10.42.0.254"

So seems that IPs cannot be allocated but there’s absolutely not enough container to fill up the subnet. Now I wonder if I can safely change it since it’s a single docker instance, and it is not installed in HA on a Kubernetes cluster

Running k3s check-config I found this:

Does anybody ever experienced this and knows what needs to be done?

Found the solution:
mv /var/lib/cni/networks/cbr0 /var/lib/cni/networks/cbr0.bak

All used IP addresses are stored in this folder, but not cleaned up for some reason. Renaming the folder allowed the CNI to recreate it and repopulate only with pending pod’s which were requiring an IP address.

Reference: