Cluster cannot be accessed trough rancher after Rancher upgrade

Nik_Perne · October 19, 2023, 8:56am

Hello,

I don’t know what to do anymore.
In my company I have two clusters. One is the local cluster which is accessable trough rancher and rancher is installed on. Second cluster is the created cluster trough Rancher. Some time back when I upgraded the second cluster it became stuck in provisioning state and was showing waitng for plan to be applied, but everything worked fine (accessable trough rancher an everything was working perfectly).

So 2 days ago I wanted to upgrade Rancher from 2.7.4 to 2.7.6 with the helm command to upgrade Rancher (I saved my config to file and passed it whith the upgrade command, I also added the global.cattle.psp.enabled to false as stated in the Rancher docs). The upgrade ran smoothly on the local cluster everything upgraded and on the second cluster everything upgraded as well, but when the upgrade finished the second cluster became unavailable to open in rancher (I could only access it trough the URL which I had it saved previously - screenshots).

Under cluster management I saw my cluser is still in provisioning state but insted of saying waiting for plan to be applied it said: Configuring bootstrap node(s) custom-040d4ae88aa7: waiting for probes: kube-controller-manager, kube-scheduler

The nodes are also in reconciling.

local cluster:
RKE2 - v1.25.11+rke2r1

second cluster:
RKE2 - 1.25.11+rke2r1

I havent found any useful logs about this issue.

I have banging my head with this for 2 days and no luck. Any help would be greatly aprechiated

Thank you all!

Nik_Perne · November 2, 2023, 12:35pm

Problem was expired kube-controller-manager and kube-scheduler cacert.

Solved with this method:

Check the cert:
openssl x509 -text -in /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt | grep -A 2 Validity

Check probes if they are okay:

(
curl  --cacert /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt \
  https://127.0.0.1:10257/healthz >/dev/null 2>&1 \
  && echo "[OK] Kube Controller probe" \
  || echo "[FAIL] Kube Controller probe";

curl --cacert /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt \
  https://127.0.0.1:10259/healthz >/dev/null 2>&1  \
  && echo "[OK] Scheduler probe" \
  || echo "[FAIL] Scheduler probe";
)

Force rotate certificates (you cannot force rotate form rancher - I tried and it does nothing):

echo "Rotating kube-controller-manager certificate"
rm /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.{crt,key}
crictl rm -f $(crictl ps -q --name kube-controller-manager)

echo "Rotating kube-scheduler certificate"
rm /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.{crt,key}
crictl rm -f $(crictl ps -q --name kube-scheduler)

If crictl is not working:

export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml
export CONTAINERD_ADDRESS=unix:///run/k3s/containerd/containerd.sock
export PATH=$PATH:/var/lib/rancher/rke2/bin

Topic		Replies	Views
After rke upgrade to HA rancher cluster, rancher intermmitantly reports "cluster is currently unavailable" Rancher	1	769	February 4, 2020
Cannot access newly created cluster with kubeconfig from Rancher Rancher	2	662	July 25, 2023
Recover access to orphaned cluster Rancher	8	4085	January 29, 2023
Can't access cluster from kubectl after Rancher certificate update Rancher	7	1120	February 21, 2024
Cluster unavailable after upgrading Rancher 2.6.8 to 2.6.14 Rancher	0	314	February 22, 2024

Cluster cannot be accessed trough rancher after Rancher upgrade

Related topics