I’m currently unable to access my Rancher local cluster’s dashboard and when I try from my web browser I’m presented with “503 Service Temporarily Unavailable” and occasionally “502 Bad Gateway”. It becomes intermittently available and when I’m able to access it in that period I can see that the
rancher pods in the
cattle-system namespace are in a
crashloopbackoff state. The following screenshots are of the rancher pod logs where it looks like there’s a kernel panic of some sort.
I have a theory that this is because I’ve updated an autoscaling group in AWS that was assigned to one of the clusters this Rancher instance manages. Since making that change I’ve been presented with a message stating “Sync Error” for that node group. I have since reverted the settings in AWS but that has not changed anything.
Additionally, I deleted and unused node group in the cluster config soon before the issues arose.
What can I do to get this service back up and running? It’s very high priority as I’m now unable to access any of my production cluster’s since their kubeconfig is proxied through Rancher.