Cluster cannot be accessed trough rancher after Rancher upgrade

Hello,

I don’t know what to do anymore.
In my company I have two clusters. One is the local cluster which is accessable trough rancher and rancher is installed on. Second cluster is the created cluster trough Rancher. Some time back when I upgraded the second cluster it became stuck in provisioning state and was showing waitng for plan to be applied, but everything worked fine (accessable trough rancher an everything was working perfectly).

So 2 days ago I wanted to upgrade Rancher from 2.7.4 to 2.7.6 with the helm command to upgrade Rancher (I saved my config to file and passed it whith the upgrade command, I also added the global.cattle.psp.enabled to false as stated in the Rancher docs). The upgrade ran smoothly on the local cluster everything upgraded and on the second cluster everything upgraded as well, but when the upgrade finished the second cluster became unavailable to open in rancher (I could only access it trough the URL which I had it saved previously - screenshots).

Under cluster management I saw my cluser is still in provisioning state but insted of saying waiting for plan to be applied it said: Configuring bootstrap node(s) custom-040d4ae88aa7: waiting for probes: kube-controller-manager, kube-scheduler

The nodes are also in reconciling.

local cluster:
RKE2 - v1.25.11+rke2r1

second cluster:
RKE2 - 1.25.11+rke2r1

I havent found any useful logs about this issue.

I have banging my head with this for 2 days and no luck. Any help would be greatly aprechiated :slight_smile:

Thank you all!

Problem was expired kube-controller-manager and kube-scheduler cacert.

Solved with this method:

Check the cert:
openssl x509 -text -in /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt | grep -A 2 Validity

image

Check probes if they are okay:

(
curl  --cacert /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt \
  https://127.0.0.1:10257/healthz >/dev/null 2>&1 \
  && echo "[OK] Kube Controller probe" \
  || echo "[FAIL] Kube Controller probe";

curl --cacert /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt \
  https://127.0.0.1:10259/healthz >/dev/null 2>&1  \
  && echo "[OK] Scheduler probe" \
  || echo "[FAIL] Scheduler probe";
)

Force rotate certificates (you cannot force rotate form rancher - I tried and it does nothing):

echo "Rotating kube-controller-manager certificate"
rm /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.{crt,key}
crictl rm -f $(crictl ps -q --name kube-controller-manager)
echo "Rotating kube-scheduler certificate"
rm /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.{crt,key}
crictl rm -f $(crictl ps -q --name kube-scheduler)

If crictl is not working:

export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml
export CONTAINERD_ADDRESS=unix:///run/k3s/containerd/containerd.sock
export PATH=$PATH:/var/lib/rancher/rke2/bin