This is probably a very basic problem, but I cannot seem to figure out how to resolve it. I have a Rancher 2.6.5 installation on a single node K3s local cluster, and it is also managing a remote 5 node K3s cluster. Everything was fine for a year. After that year, a certificate…somewhere…expired. I get the following error on my cattle agent pod on the remote cluster:
time="2022-07-19T03:55:56Z" level=fatal msg="Server certificate is not valid, please check if the host has the correct time configured and if the server certificate has a notAfter date and time in the future. Certificate information is displayed above. error: Get \"https://rancher.example.com\": x509: certificate has expired or is not yet valid: current time 2022-07-19T03:55:56Z is after 2022-04-08T03:20:16Z"
I can see, obviously, that the error is caused by an expired certificate. I’m even fairly confident I can successfully renew it with a little Googling. The problem is that I can’t figure out which one expired. When I look through all the cert directories I can find related to the K3s clusters and Rancher itself, every cert I find is still valid (at least according to OpenSSL). Can anyone provide me with some direction on where to look for the relevant cert(s)?
You may be able to find specifics in Rancher docs, but generally speaking K3S & RKE2 will renew a variety of certs for internal Kubernetes communication when you restart them. They expire in a year and will renew any time they’re 90 days from expiration when restarting. So restarting your K3S service should do the trick.
Thanks! Do you happen to know if that process changes at all if the certs have already expired? I’ve restarted my K3s clusters multiple times (restarting both services and nodes, just in case) with no change. I also attempted the certificate rotation subcommand for K3s like this:
$ sudo systemctl stop k3s
$ sudo k3s certificate rotate
INFO Server detected, rotating server certificates
INFO Rotating certificates for admin service
INFO Rotating certificates for etcd service
INFO Rotating certificates for api-server service
INFO Rotating certificates for controller-manager service
INFO Rotating certificates for cloud-controller service
INFO Rotating certificates for scheduler service
INFO Rotating certificates for k3s-server service
INFO Rotating dynamic listener certificate
INFO Rotating certificates for k3s-controller service
INFO Rotating certificates for auth-proxy service
INFO Rotating certificates for kubelet service
INFO Rotating certificates for kube-proxy service
INFO Successfully backed up certificates for all services to path /var/lib/rancher/k3s/server/tls-1658241938, please restart k3s server or agent to rotate certificates
I then restarted the cattle-cluster-agent deployment, and the certificate validation error still persisted. I feel like I must just be doing things in the wrong order, missing a step somewhere, etc., but I don’t see anything missing according to the documentation I’ve gone through so far.
Sadly I don’t know. I know there are some Rancher documentation pages that people get pointed to in Slack, but I haven’t needed them so haven’t looked them up (and I’m using RKE2 instead of K3S, so there are some differences).