Rancher 2.2.2 certificate expiration issues

phrenq · May 5, 2020, 6:15am

Hi,

Our rancher 2.2.2 installation (upgraded from 2.0.8 one year ago) went in error, the etcd server was not starting, complaining of an expire

We removed the indeed expired certificates from /var/rancher/lib/state-management/tls (localhost.crt and token-node.crt) and restarted the rancher container. This fixed rancher, however it now cannot connect to the single cluster it manages (it seems that rancher doesn’t have the correct credentials to the cluster)

The cluster seems alive, however we have no way to check it

Does some have an idea how to recover the communication between rancher and kubernetes ?

Thank you in advance

justinwcarter · May 7, 2020, 10:26am

I have the exact same issue with a Rancher v2.2.9 Docker installation. The localhost.crt and token-node.crt certs in /var/lib/rancher/state-management/tls have expired so the Rancher container is restarting every 11-12 seconds.

The logs show many cert errors repeated over and over until the process stops;

2020/05/07 07:15:24 [INFO] Waiting for server to become available: Get https://localhost:6443/version?timeout=30s: x509: certificate has expired or is not yet valid
2020-05-07 07:15:24.815346 I | http: TLS handshake error from 127.0.0.1:43826: remote error: tls: bad certificate
2020-05-07 07:15:24.828573 I | http: TLS handshake error from 127.0.0.1:43876: remote error: tls: bad certificate
E0507 07:15:24.856329       5 reflector.go:134] k8s.io/client-go/informers/factory.go:127: Failed to list *v1.ReplicaSet: Get https://localhost:6443/apis/apps/v1/replicasets?limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid
E0507 07:15:24.857714       5 reflector.go:134] k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:178: Failed to list *v1.Pod: Get https://localhost:6443/api/v1/pods?fieldSelector=status.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded&limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid
E0507 07:15:24.861677       5 reflector.go:134] k8s.io/client-go/informers/factory.go:127: Failed to list *v1.Node: Get https://localhost:6443/api/v1/nodes?limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid
E0507 07:15:24.862446       5 reflector.go:134] k8s.io/client-go/informers/factory.go:127: Failed to list *v1.PersistentVolume: Get https://localhost:6443/api/v1/persistentvolumes?limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid
E0507 07:15:24.863244       5 reflector.go:134] k8s.io/client-go/informers/factory.go:127: Failed to list *v1.PersistentVolumeClaim: Get https://localhost:6443/api/v1/persistentvolumeclaims?limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid
2020-05-07 07:15:24.863976 I | http: TLS handshake error from 127.0.0.1:43888: remote error: tls: bad certificate

If I set the system clock to a date in the past before the certificates expired Rancher can start up, however that’s only a viable workaround for the very, very short term.

@vincent Have you guys come across this issue before? This is the second Rancher server I’ve seen it on and I can’t find any official workaround or fix. My solution the first time was to deploy a new Rancher server and build a new cluster because I was crunched for time

A community member in Slack (not a Rancher staffer) suggested the following (NOTE: I haven’t personally tried this yet, so just a warning to others I’m not recommending it at this time, just asking a question!);

rm /etc/kubernetes/ssl/*
rm /var/lib/rancher/management-state/certs/bundle.json
rm /var/lib/rancher/management-state/tls/token-node.crt
rm /var/lib/rancher/management-state/tls/localhost.crt

Would it be possible for someone at Rancher to verify if the above is a potential solution, or perhaps suggest a safe alternative?

justinwcarter · May 8, 2020, 7:44am

Just a quick update to this, I’ve done a workaround today to confirm that it fixed my problem by removing the 3 files, however the /etc/kubernetes/ssl path didn’t exist for me.

I’ve logged an Issue on GitHub with all the details;

manuel-koch · October 30, 2020, 9:29am

I have the same issue ( but my rancher server version is v2.3.2 ) and tried to solve it following one of the suggested solutions from the GitHub ticket mentioned.
See my comment at https://github.com/rancher/rancher/issues/26984#issuecomment-718898677

Unfortunately my rancher server is now stuck in
2020/10/29 16:55:39 [INFO] Waiting for server to become available: Get https://localhost:6443/version?timeout=30s: x509: certificate signed by unknown authority

Does anybody have an idea how to fix such rancher server issue ?

manuel-koch · November 2, 2020, 11:06am

As mentioned here https://github.com/rancher/rancher/issues/26984#issuecomment-720320606
I was able to resolve my issue deleting some other files too ( /var/lib/rancher/management-state/tls ) and restarting rancher server.

Ca_Husky_TV · March 8, 2023, 2:42am

I’ve done follow of your comment but it’s still error

7 reflector.go:134] k8s.io/client-go/informers/factory.go:127: Failed to list *v1.StatefulSet: Get https://localhost:6443/apis/apps/v1/statefulsets?limit=500&resourceVersion=0: x509: certificate signed by unknown authority

my rancher version: v2.2.4

Topic		Replies	Views
X509 certificate has expired or is not yet valid Rancher	13	26255	October 19, 2022
Another expired certs issue Rancher	2	4726	July 12, 2021
(Urgent) Unable to connect to the server: x509: certificate has expired or is not yet valid Rancher	1	1869	November 23, 2022
Serving-cert on AKS Cluster has expired	3	1010	May 11, 2021
Expired certificates - Not sure what to renew Rancher	4	8576	August 18, 2022

Rancher 2.2.2 certificate expiration issues

Related topics