Rancher 2.5.9 Can't import Cluster: cattle-cluster-agent CrashLoopBackOff

Hi there!

I’m failing to import a new custom Cluster into Rancher 2.5.9.
When i deploy the registration command I can see the cattle-cluster-agent failing to start and hanging on CrashLoopBackOff.

Here the output of the logs from the cattle-cluster agent:

INFO: Environment: CATTLE_ADDRESS=10.230.244.204 CATTLE_CA_CHECKSUM= CATTLE_CLUSTER=true CATTLE_CLUSTER_REGISTRY= CATTLE_FEATURES= CATTLE_INTERNAL_ADDRESS= CATTLE_IS_RKE=false CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-94c7f6d94-zz5hc CATTLE_SERVER=https://beta-rancher.*DOMAIN*.com
INFO: Using resolv.conf: nameserver 10.96.0.10 search cattle-system.svc.cluster.local svc.cluster.local cluster.local options ndots:5
INFO: https://beta-rancher.*DOMAIN*.com/ping is accessible
INFO: beta-rancher.*DOMAIN*.com resolves to 172.23.0.90
time="2021-09-10T12:12:53Z" level=info msg="Listening on /tmp/log.sock"
time="2021-09-10T12:12:53Z" level=info msg="Rancher agent version v2.5.9 is starting"
time="2021-09-10T12:12:53Z" level=info msg="Connecting to wss://beta-rancher.*DOMAIN*.com/v3/connect/register with token p5q9s6l5h58ch724xjv6zctwv62xrk64qwxnsx8w4fjtxtks5h5ncd"
time="2021-09-10T12:12:53Z" level=info msg="Connecting to proxy" url="wss://beta-rancher*DOMAIN*.com/v3/connect/register"
time="2021-09-10T12:12:53Z" level=info msg="Starting user controllers"
time="2021-09-10T12:12:53Z" level=fatal msg="the server could not find the requested resource"

And here the output of the describe command:

Name:         cattle-cluster-agent-94c7f6d94-xwnvk
Namespace:    cattle-system
Priority:     0
Node:         jmck8w110/172.23.0.110
Start Time:   Fri, 10 Sep 2021 12:22:24 +0000
Labels:       app=cattle-cluster-agent
              pod-template-hash=94c7f6d94
Annotations:  cni.projectcalico.org/containerID: a37f16dabd2df2ef93802b4d39d176e31100c0cb4d0fda79f47937ffa588d617
              cni.projectcalico.org/podIP: 10.230.244.205/32
              cni.projectcalico.org/podIPs: 10.230.244.205/32
Status:       Running
IP:           10.230.244.205
IPs:
  IP:           10.230.244.205
Controlled By:  ReplicaSet/cattle-cluster-agent-94c7f6d94
Containers:
  cluster-register:
    Container ID:   docker://ef2a86da547fd9759b43feef1f546a31c3d4f2cc2136b78892e89578068a0e2b
    Image:          rancher/rancher-agent:v2.5.9
    Image ID:       docker-pullable://rancher/rancher-agent@sha256:fee10940c61c36055120ca9ef624810a740647cff4751d0b21266939c9da4b93
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 10 Sep 2021 12:22:27 +0000
      Finished:     Fri, 10 Sep 2021 12:22:27 +0000
    Ready:          False
    Restart Count:  1
    Readiness:      http-get http://:8080/health delay=2s timeout=1s period=5s #success=1 #failure=3
    Environment:
      CATTLE_FEATURES:
      CATTLE_IS_RKE:            false
      CATTLE_SERVER:            https://beta-rancher.*DOMAIN*.com
      CATTLE_CA_CHECKSUM:
      CATTLE_CLUSTER:           true
      CATTLE_K8S_MANAGED:       true
      CATTLE_CLUSTER_REGISTRY:
    Mounts:
      /cattle-credentials from cattle-credentials (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-z65cj (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  cattle-credentials:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cattle-credentials-e1ac756
    Optional:    false
  kube-api-access-z65cj:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                             node-role.kubernetes.io/controlplane=true:NoSchedule
                             node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age              From               Message
  ----     ------     ----             ----               -------
  Normal   Scheduled  10s              default-scheduler  Successfully assigned cattle-system/cattle-cluster-agent-94c7f6d94-xwnvk to jmck8w110
  Normal   Pulled     7s (x2 over 9s)  kubelet            Container image "rancher/rancher-agent:v2.5.9" already present on machine
  Normal   Created    7s (x2 over 9s)  kubelet            Created container cluster-register
  Normal   Started    7s (x2 over 9s)  kubelet            Started container cluster-register
  Warning  BackOff    4s (x3 over 6s)  kubelet            Back-off restarting failed container

All the Servers (KE Nodes and Rancher are in the same subnet, DNS seems to work just fine and given this output I can’t figure out where is the problem. I’m not sure what the “the server could not find the requested resource” entry in the log is referring to. Can anybody help?

I’m running KE 1.22 on Ubuntu 20.04 LTS.

Thank you very much!

I re-installed the cluster and but it did not help. I then tried installing the previous version of Kubernetes (1.21.4) and retried. With this version the cattle-cluster-agent pod starts with no problems and i could import the cluster in Rancher.

I still don’t understand if its a bug or if for Kubernetes 1.22.1 more steps are necessary for the import. Anyway, I’ll stick with KE 1.21.4 for now.

1 Like

any update regarding this? I have the same issue with KE 1.22.x

Hy,

Check supported Kubernetes versions by Rancher versions:

Only Rancher 2.6.4 supports kubernetes-1.22

Well, to complete our status. We were able to resolve the issue. In our case it was not possible to deploy a new cluster or go back to version 1.21.
Once we figured it out it was easy. With the cluster at 1.22, we upgraded to rancher 2.6.11 and then manually edited the cluster-agent deployment (to upgrade to version 2.6.11).
Once we edited it, it deployed successfully and was able to connect to GKE.
Greetings!