SOLVED - Rancher Inaccessible - 503 Service Temporarily Unavailable

Hi there,

I’m currently unable to access my Rancher local cluster’s dashboard and when I try from my web browser I’m presented with “503 Service Temporarily Unavailable” and occasionally “502 Bad Gateway”. It becomes intermittently available and when I’m able to access it in that period I can see that the rancher pods in the cattle-system namespace are in a crashloopbackoff state. The following screenshots are of the rancher pod logs where it looks like there’s a kernel panic of some sort.

I have a theory that this is because I’ve updated an autoscaling group in AWS that was assigned to one of the clusters this Rancher instance manages. Since making that change I’ve been presented with a message stating “Sync Error” for that node group. I have since reverted the settings in AWS but that has not changed anything.

Additionally, I deleted and unused node group in the cluster config soon before the issues arose.

What can I do to get this service back up and running? It’s very high priority as I’m now unable to access any of my production cluster’s since their kubeconfig is proxied through Rancher.

This issue has been resolved by editing the cluster object and setting spec.eksConfig.imported to true for the affected cluster. You can do this with kubectl edit {cluster-id}.

This has prevented Rancher from crashing and I’m able to access the services correctly now.

Version 2.6.3 resolves the bug that caused this issue.