Hi there!
I’m failing to import a new custom Cluster into Rancher 2.5.9.
When i deploy the registration command I can see the cattle-cluster-agent failing to start and hanging on CrashLoopBackOff.
Here the output of the logs from the cattle-cluster agent:
INFO: Environment: CATTLE_ADDRESS=10.230.244.204 CATTLE_CA_CHECKSUM= CATTLE_CLUSTER=true CATTLE_CLUSTER_REGISTRY= CATTLE_FEATURES= CATTLE_INTERNAL_ADDRESS= CATTLE_IS_RKE=false CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-94c7f6d94-zz5hc CATTLE_SERVER=https://beta-rancher.*DOMAIN*.com
INFO: Using resolv.conf: nameserver 10.96.0.10 search cattle-system.svc.cluster.local svc.cluster.local cluster.local options ndots:5
INFO: https://beta-rancher.*DOMAIN*.com/ping is accessible
INFO: beta-rancher.*DOMAIN*.com resolves to 172.23.0.90
time="2021-09-10T12:12:53Z" level=info msg="Listening on /tmp/log.sock"
time="2021-09-10T12:12:53Z" level=info msg="Rancher agent version v2.5.9 is starting"
time="2021-09-10T12:12:53Z" level=info msg="Connecting to wss://beta-rancher.*DOMAIN*.com/v3/connect/register with token p5q9s6l5h58ch724xjv6zctwv62xrk64qwxnsx8w4fjtxtks5h5ncd"
time="2021-09-10T12:12:53Z" level=info msg="Connecting to proxy" url="wss://beta-rancher*DOMAIN*.com/v3/connect/register"
time="2021-09-10T12:12:53Z" level=info msg="Starting user controllers"
time="2021-09-10T12:12:53Z" level=fatal msg="the server could not find the requested resource"
And here the output of the describe command:
Name: cattle-cluster-agent-94c7f6d94-xwnvk
Namespace: cattle-system
Priority: 0
Node: jmck8w110/172.23.0.110
Start Time: Fri, 10 Sep 2021 12:22:24 +0000
Labels: app=cattle-cluster-agent
pod-template-hash=94c7f6d94
Annotations: cni.projectcalico.org/containerID: a37f16dabd2df2ef93802b4d39d176e31100c0cb4d0fda79f47937ffa588d617
cni.projectcalico.org/podIP: 10.230.244.205/32
cni.projectcalico.org/podIPs: 10.230.244.205/32
Status: Running
IP: 10.230.244.205
IPs:
IP: 10.230.244.205
Controlled By: ReplicaSet/cattle-cluster-agent-94c7f6d94
Containers:
cluster-register:
Container ID: docker://ef2a86da547fd9759b43feef1f546a31c3d4f2cc2136b78892e89578068a0e2b
Image: rancher/rancher-agent:v2.5.9
Image ID: docker-pullable://rancher/rancher-agent@sha256:fee10940c61c36055120ca9ef624810a740647cff4751d0b21266939c9da4b93
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 10 Sep 2021 12:22:27 +0000
Finished: Fri, 10 Sep 2021 12:22:27 +0000
Ready: False
Restart Count: 1
Readiness: http-get http://:8080/health delay=2s timeout=1s period=5s #success=1 #failure=3
Environment:
CATTLE_FEATURES:
CATTLE_IS_RKE: false
CATTLE_SERVER: https://beta-rancher.*DOMAIN*.com
CATTLE_CA_CHECKSUM:
CATTLE_CLUSTER: true
CATTLE_K8S_MANAGED: true
CATTLE_CLUSTER_REGISTRY:
Mounts:
/cattle-credentials from cattle-credentials (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-z65cj (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
cattle-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: cattle-credentials-e1ac756
Optional: false
kube-api-access-z65cj:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/controlplane=true:NoSchedule
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10s default-scheduler Successfully assigned cattle-system/cattle-cluster-agent-94c7f6d94-xwnvk to jmck8w110
Normal Pulled 7s (x2 over 9s) kubelet Container image "rancher/rancher-agent:v2.5.9" already present on machine
Normal Created 7s (x2 over 9s) kubelet Created container cluster-register
Normal Started 7s (x2 over 9s) kubelet Started container cluster-register
Warning BackOff 4s (x3 over 6s) kubelet Back-off restarting failed container
All the Servers (KE Nodes and Rancher are in the same subnet, DNS seems to work just fine and given this output I can’t figure out where is the problem. I’m not sure what the “the server could not find the requested resource” entry in the log is referring to. Can anybody help?
I’m running KE 1.22 on Ubuntu 20.04 LTS.
Thank you very much!