Hi All
I’m running Rancher 2.4.5 and recently ran into the expired certificates issue. The UI is not accessible and there are error messages in the logs pointing to expired certs.
I’ve followed a number of solutions online
opened 07:36AM - 08 May 20 UTC
closed 02:41PM - 14 Apr 22 UTC
status/stale
**What kind of request is this (question/bug/enhancement/feature request):**
… Bug
**Steps to reproduce (least amount of steps as possible):**
- Install Rancher v2.0.0, upgrade to v2.0.2 -> v2.0.4 -> v2.0.8
- Upgrade to v2.1.6
- One year after Rancher v2.0.0 was installed, certificates expire and cluster becomes "unavailable"
- Upgrade to v2.1.9; did not fix certificate expiry/rotation issue
- Upgrade to v2.2.2, certificated rotated and cluster is available again, everything working
- One year after Rancher v2.2.2 was installed, the Rancher Server UI become unavailable due to the container restarting every 12 seconds
- Perform a backup of /var/lib/rancher, two certs inside the backup are expired and Rancher does not auto renew them;
- /var/rancher/lib/state-management/tls/localhost.crt
- /var/rancher/lib/state-management/tls/token-node.crt
(I think you could simulate the above timeline by setting the system clock to a date in the past and then moving it forward at the appropriate time to reproduce a ~1 year jump).
**Result:**
Running Rancher v2.2.9 as a single Docker container install, the Rancher Server UI becomes unavailable ("connection refused" in the browser) and the container is restarting every 12 seconds. Rancher is unusable.
**Environment information**
- Rancher version (`rancher/rancher`/`rancher/server` image tag or shown bottom left in the UI): rancher/rancher v2.2.9
- Installation option (single install/HA): Single install (Docker container)
**Possible Workarounds:**
**_Workaround 1)_**
Set the system clock to a date in the past so that the certificate is not seen as expired. For me, on an Ubuntu server, that was achievable by disabling NTP and then setting the date and time manually;
```
sudo timedatectl set-ntp off
sudo date --set="2020-05-05 09:03:00.000"
```
This allowed the container to start up correctly and the Rancher Server UI was usable again, but this is only a short term workaround at best.
**_Workaround 2)_**
**NOTE:** I'm *not* advocating anyone use these commands on their particular installation, I'm just providing it as feedback for review by Rancher staff, because for me it solved the issue I was having...
This workaround was suggested to me by a community member on Rancher's Slack.
```
rm /etc/kubernetes/ssl/*
rm /var/lib/rancher/management-state/certs/bundle.json
rm /var/lib/rancher/management-state/tls/token-node.crt
rm /var/lib/rancher/management-state/tls/localhost.crt
```
Inside the rancher container I did not have a `/etc/kubernetes/ssl` directory so I could not run that first command. The other three files did exist (and were originally visible inside the backup of `/var/lib/rancher`).
Actual command I ran to remove the files (NOTE: again, please don't take this as advice, I'm just providing it for reference);
```
sudo docker exec -it acd7 sh -c "rm /var/lib/rancher/management-state/certs/bundle.json; rm /var/lib/rancher/management-state/tls/token-node.crt; rm /var/lib/rancher/management-state/tls/localhost.crt"
```
Then I enabled NTP again with `sudo timedatectl set-ntp on` to set the system clock back to the real/current time, and restarted the container with `sudo docker restart acd7`. Rancher started up correctly and was available again, clusters were visible (two AWS EC2 clusters attached to this server).
**Other details that may be helpful:**
**Images on server**
```
$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
busybox latest 020584afccce 6 months ago 1.22 MB
rancher/rancher v2.2.9 944b5893d458 6 months ago 483 MB
rancher/rancher v2.1.9 9a79850e485c 12 months ago 541 MB
rancher/rancher v2.2.2 cb5cf64e84cc 12 months ago 495 MB
alpine latest caf27325b298 15 months ago 5.53 MB
rancher/rancher v2.1.6 d14ff1038a54 15 months ago 542 MB
rancher/rancher v2.0.8 817b51fbc1fc 20 months ago 529 MB
rancher/rancher v2.0.4 975f0d475e47 22 months ago 530 MB
rancher/rancher v2.0.2 88526c7bea4e 23 months ago 521 MB
rancher/rancher v2.0.0 3141e5c66ee8 2 years ago 535 MB
```
**Rancher Logs**
When the problem first occuredRancher starts up then shows many "bad certificate"/"certificate has expired or is not yet valid" errors;
```
2020/05/07 07:15:22 [INFO] Rancher version v2.2.9 is starting
2020/05/07 07:15:22 [INFO] Rancher arguments {ACMEDomains:[redacted] AddLocal:auto Embedded:false KubeConfig: HTTPListenPort:80 HTTPSListenPort:443 K8sMode:auto Debug:false NoCACerts:false ListenConfig:<nil> AuditLogPath:/var/log/auditlog/rancher-api-audit.log AuditLogMaxage:10 AuditLogMaxsize:100 AuditLogMaxbackup:10 AuditLevel:0}
2020/05/07 07:15:22 [INFO] Listening on /tmp/log.sock
2020/05/07 07:15:22 [INFO] Running etcd --data-dir=management-state/etcd
...
I0507 07:15:24.805853 5 naming_controller.go:284] Starting NamingConditionController
I0507 07:15:24.805873 5 establishing_controller.go:73] Starting EstablishingController
2020/05/07 07:15:24 [INFO] Waiting for server to become available: Get https://localhost:6443/version?timeout=30s: x509: certificate has expired or is not yet valid
2020-05-07 07:15:24.815346 I | http: TLS handshake error from 127.0.0.1:43826: remote error: tls: bad certificate
2020-05-07 07:15:24.828573 I | http: TLS handshake error from 127.0.0.1:43876: remote error: tls: bad certificate
E0507 07:15:24.856329 5 reflector.go:134] k8s.io/client-go/informers/factory.go:127: Failed to list *v1.ReplicaSet: Get https://localhost:6443/apis/apps/v1/replicasets?limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid
E0507 07:15:24.857714 5 reflector.go:134] k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:178: Failed to list *v1.Pod: Get https://localhost:6443/api/v1/pods?fieldSelector=status.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded&limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid
E0507 07:15:24.861677 5 reflector.go:134] k8s.io/client-go/informers/factory.go:127: Failed to list *v1.Node: Get https://localhost:6443/api/v1/nodes?limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid
E0507 07:15:24.862446 5 reflector.go:134] k8s.io/client-go/informers/factory.go:127: Failed to list *v1.PersistentVolume: Get https://localhost:6443/api/v1/persistentvolumes?limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid
E0507 07:15:24.863244 5 reflector.go:134] k8s.io/client-go/informers/factory.go:127: Failed to list *v1.PersistentVolumeClaim: Get https://localhost:6443/api/v1/persistentvolumeclaims?limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid
2020-05-07 07:15:24.863976 I | http: TLS handshake error from 127.0.0.1:43888: remote error: tls: bad certificate
E0507 07:15:24.864317 5 reflector.go:134] k8s.io/client-go/informers/factory.go:127: Failed to list *v1.ReplicationController: Get https://localhost:6443/api/v1/replicationcontrollers?limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid
...
E0507 07:15:33.926893 5 reflector.go:134] k8s.io/client-go/informers/factory.go:127: Failed to list *v1.StorageClass: Get https://localhost:6443/apis/storage.k8s.io/v1/storageclasses?limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid
2020-05-07 07:15:33.926916 I | http: TLS handshake error from 127.0.0.1:44320: remote error: tls: bad certificate
E0507 07:15:33.932574 5 reflector.go:134] k8s.io/client-go/informers/factory.go:127: Failed to list *v1.Service: Get https://localhost:6443/api/v1/services?limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid
2020-05-07 07:15:33.932599 I | http: TLS handshake error from 127.0.0.1:44324: remote error: tls: bad certificate
2020-05-07 07:15:34.822709 I | http: TLS handshake error from 127.0.0.1:44328: remote error: tls: bad certificate
2020-05-07 07:15:34.825263 I | http: TLS handshake error from 127.0.0.1:44332: remote error: tls: bad certificate
F0507 07:15:34.825392 5 controllermanager.go:184] error building controller context: failed to wait for apiserver being healthy: timed out waiting for the condition: failed to get apiserver /healthz status: Get https://localhost:6443/healthz?timeout=32s: x509: certificate has expired or is not yet valid
```
I also have a copy of the logs showing the first start up after Workaround 2 above was performed, I can provide this on request if needed.
Hello,
I’m using Rancher in a single docker container running version 2.4.5 and I have imported my ‘clusters’ . I currently receive the following error when I goto the rancher GUI
2021-06-10 17:54:41.461733 I | http: TLS handshake error from 127.0.0.1:59626: remote error: tls: bad certificate
2021-06-10 17:54:43.464172 I | http: TLS handshake error from 127.0.0.1:59628: remote error: tls: bad certificate
2021/06/10 17:54:43 [INFO] Waiting for server to become available: Get https://127.0.0.1…
I’m able to bring the UI back up but the certificate is still invalid
I cannot connect to the cluster using kubectl
kubectl --kubeconfig=config get nodes -o wide
Unable to connect to the server: x509: certificate has expired or is not yet valid: current time 2021-06-29T10:50:55-04:00 is after 2021-06-26T20:11:34Z
There is a managed cluster hosted in Rancher that is stuck in “Updating” but the cluster is accessible by end users and is able to host apps.
How can I update this certificate?
thanks
Garry
I am with the same problem, and with the same version. After 1 year running OK, this same problem occurred. Try this utility How to change Rancher 2.x server-url
, with this “bash rancher-single-tool.sh -t’upgrade ‘-r’ - acme-domain newhostname .company.com '”, trying to force the same domain, but when restarting it tells me " [INFO] Waiting for k3s to start " , and that in a worse state … You have been able to solve it ?