Rancher 2.7 Cluster agent is not connected

History:

  • Running Rancher 2.6 with 5 AWS clusters

  • November 2022, AWS complains that Kubernetes needs to be upgraded from 1.21

  • November 2022, Upgrading Rancher to 2.7 to allow for upgrade of Kubernetes

  • December 2022, Succesfull upgraded clusters to version 1.22

  • December 2022, two clusters fails, ends up deleting them as I cannot get access

  • December 2022, cannot create a new cluster on AWS (fails on timeout, clusters seems fine in AWS UI)

  • December 2022, succesfull upgraded two clusters to 1.23, one fails to upgrade to 1.24

  • January 2023, Clusters are now (after Christmas vacation) not connected

Any guides on how to reconnect the clusters?
Any guides to avoid rancher trying to downgrade from 1.24 to 1.23?

1 Like

Tried to create a new Rancher 2.7,

  • imported an existing EKS Cluster
  • Created a new EKS cluster with default values

Nothing in Rancher 2.7 seems to work with AWS EKS?
Should I downgrade Rancher? will it help me?

Any help is welcome…

I have tried creating new EKS clusters with Rancher 2.7 (latest), 2.6.9 and 2.7-head - all with same result.

The underlying issue is “Cluster agent is not connected”

I guess that the issue is within Amazon EKS for nodes above version 1.21…

How do I make the cluster agent connect? how do I find the reason?

Hey there,

It’s probably due to the version of k8s on EKS you’re using.
According to the docs for Rancher 2.7, this supports k8s on EKS from version 1.23 onwards. If you’re running 1.22, it’s not officially supported.
Rancher 2.6.9 is certified from k8s 1.20 to k8s 1.23 on EKS, so that would be a better place to start with until you’ve updated your clusters to at least k8s 1.23.

How are you creating the clusters? Through Rancher, or in EKS and importing them into Rancher?

Do you have any additional logging available? Perhaps from the agents on the clusters themselves? The logging for the agents on clusters may provide some additional insight (for example, authentication errors)

You should be able to check the logs with kubectl as long as you have access to the config, or perhaps with the info from the EKS console.

Thanks

All our EKS clusters are created with Rancher.

Manually updated all existing clusters to v1.24, but still had connection issues with existing clusters and creating new ended with connection issues as well.

Upgraded Rancher to 2.7-head and success, new Agents was deployed and they are able to connect, and I was able to create new clusters

I have created EKS cluster through Rancher 2.7.

deployed kubernetesVersion: ‘1.25’

still having the same issue, is there any ssh or something that I was suppose to setup,

Any luck with this issue?

I face the same issue when importing an AKS cluster (v 1.24.10 ) on rancher 2.7.3

I am also running into similar issues when trying to start up a vSphere cluster (v1.25.9) on rancher 2.7.3. Its possible I might be doing something wrong but symptoms are the same.

i’m having the same issues! luckaly we are just running staging stuff on rancher. It’s not production stable of you ask me. Did anyone figure out a potential solution for this?

Dear All,

I was stuck in the same situation and after figuring out the lot of staff finally resolve the problem by using the same Kubectl version on both end. Actually Rancher is running with Kubernetes version 1.26.4 and I was using the 1.27.3 version. After downgrade the version to 1.26.4 agent got connected facing no error.

edit the configmap aws-auth in eks and add add the iam user with eks access to make this work
like below
apiVersion: v1
data:
mapRoles: |
- groups:
- system:bootstrappers
- system:nodes
rolearn: arn:aws:iam::xxxxxxxx:role/eksctl-mycluster-nodegroup-eksdem-NodeInstanceRole-KP5A9ZLNY7CC
username: system:node:{{EC2PrivateDNSName}}
mapUsers: |
- userarn: arn:aws:iam::051542606790:user/eks
username: eks
groups:
- system:masters
kind: ConfigMap
metadata:
creationTimestamp: “2023-07-20T03:34:45Z”
name: aws-auth
namespace: kube-system
resourceVersion: “916206”
uid: 9b3d0a03-093f-4a65-b174-d866d8fab748

Have you tried something like below from local management cluster

  1. kubectl patch clusters.management.cattle.io <CLUSTERID> -p '{"status":{"agentImage":"dummy"}}' --type merge