Cattle-pods failing

asp · October 24, 2019, 2:46pm

Hi,

I’ve (at least I think so) setup a 3 node cluster via RKE along the HA environment.
I have 3 nodes which have the roles worker, controlplane and etcd.

Rancher’s WebUI is available, but I have some system workloads which are failing:

NAME                                    READY   STATUS             RESTARTS   AGE   IP              NODE        NOMINATED NODE   READINESS GATES
cattle-cluster-agent-67d77bf855-lgh56   0/1     CrashLoopBackOff   17         63m   10.42.21.71     rancher-2   <none>           <none>
cattle-node-agent-9w4sl                 0/1     CrashLoopBackOff   17         63m   192.168.1.100   rancher-0   <none>           <none>
cattle-node-agent-9xdtk                 0/1     Error              17         63m   192.168.1.101   rancher-1   <none>           <none>
cattle-node-agent-wtjth                 0/1     CrashLoopBackOff   17         63m   192.168.1.102   rancher-2   <none>           <none>
rancher-8d5b898f-6crb5                  1/1     Running            0          94m   10.42.21.70     rancher-2   <none>           <none>
rancher-8d5b898f-gngf9                  1/1     Running            0          94m   10.42.232.3     rancher-0   <none>           <none>
rancher-8d5b898f-sc85m                  1/1     Running            0          94m   10.42.109.195   rancher-1   <none>           <none>

What are these cattle pods doing?
How can I fix this issue?

Logs are looking like this:

kubectl -n cattle-system logs -f  cattle-node-agent-9xdtk
INFO: Environment: CATTLE_ADDRESS=10.0.2.15 CATTLE_AGENT_CONNECT=true CATTLE_CA_CHECKSUM=b264c7d775539436c7b201e78070e81d3a6cf2a9772c89e0a4fdf83e93d3190d CATTLE_CLUSTER=false CATTLE_INTERNAL_ADDRESS= CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=rancher-1 CATTLE_SERVER=https://rancher.my.org
INFO: Using resolv.conf: nameserver 10.120.136.116 search t-mobile.de myomain
INFO: https://rancher.my.org/ping is accessible
INFO: rancher.my.org resolves to 91.195.240.126
parse error: Invalid numeric literal at line 1, column 10

kubectl -n cattle-system logs -f  cattle-cluster-agent-67d77bf855-lgh56
INFO: Environment: CATTLE_ADDRESS=10.42.21.71 CATTLE_CA_CHECKSUM=b264c7d775539436c7b201e78070e81d3a6cf2a9772c89e0a4fdf83e93d3190d CATTLE_CLUSTER=true CATTLE_INTERNAL_ADDRESS= CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-67d77bf855-lgh56 CATTLE_SERVER=https://rancher.my.org
INFO: Using resolv.conf: nameserver 10.43.0.10 search cattle-system.svc.cluster.local svc.cluster.local cluster.local t-mobile.de myDomain options ndots:5
INFO: https://rancher.my.org/ping is accessible
INFO: rancher.my.org resolves to 91.195.240.126
parse error: Invalid numeric literal at line 1, column 10

Thanks, Andreas

asp · October 25, 2019, 10:15am

small update. Today I rebooted all 3 rke servers and I noticed that I ran out of memory or at least I got excessive wait i/o on the nodes because I had minimal free RAM. I extended the RAM of the VMs and now node agents are online. But the cattle-cluster still shows the same issue. On this node where cattle-cluster-agent is started I have 1,1 GB free as rancherUI says. Should be more than enough.

Any ideas?

superseb · October 25, 2019, 1:04pm

That same error was seen and discussed in https://github.com/rancher/rancher/issues/22063

Topic		Replies	Views
Cattle-system status CrashLoopBackOff Rancher	6	13283	June 7, 2022
Rancher in CrashLoopBackOff cycle Rancher	0	1308	November 30, 2020
Two of the rancher pods are in error on rke2 kubernetes cluster Rancher	0	33	December 5, 2024
Cattle-cluster-agent in CrashLoopBackoff state Rancher	1	6793	February 13, 2021
Rancher 2.0.8 HA setup - Not really HA tho Rancher	4	999	October 17, 2018

Cattle-pods failing

Related topics