Hey hey,
Sorry to be bothering but I’m having an issue on an old cluster that is still in production in which the apps are still working except the rancher UI it self. This is hosted in RancherOS and since its in PROD I was hoping for someone that actually knows what the issue is rather then me trying things around and screwing up PROD.
This is a node that I use solely for the UI that then connects to other clusters that have the same installation.
[rancher@rancher ~]$ sudo system-docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f021e68356e0 rancher/os-console:v1.5.6 "/usr/bin/ros entr..." 4 months ago Up 4 months console
b81a1b6aac10 rancher/os-docker:19.03.11 "ros user-docker" 12 months ago Up 4 months docker
058b0f8b1ebb rancher/os-base:v1.5.6 "/usr/bin/ros entr..." 12 months ago Up 4 months ntp
cc61faa9647a rancher/os-base:v1.5.6 "/usr/bin/ros entr..." 12 months ago Up 4 months network
ef17d6ffb9eb rancher/os-base:v1.5.6 "/usr/bin/ros entr..." 12 months ago Up 4 months udev
2e7b4e362c13 rancher/container-crontab:v0.4.0 "container-crontab" 12 months ago Up 4 months system-cron
ee7aaf96d97d rancher/os-syslog:v1.5.6 "/usr/bin/entrypoi..." 12 months ago Up 4 months syslog
7fb2f561a810 rancher/os-acpid:v1.5.6 "/usr/bin/ros entr..." 12 months ago Up 4 months acpid
[rancher@rancher ~]$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
261d4c9fee6e rancher/rancher:latest "entrypoint.sh --acm…" 10 months ago Up 11 minutes 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp flamboyant_wing
The rancher/rancher:latest logs are as follows:
I1206 14:02:29.327047 38 shared_informer.go:230] Caches are synced for HPA
W1206 14:02:29.328741 38 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="local-node" does not exist
I1206 14:02:29.345929 38 shared_informer.go:230] Caches are synced for service account
I1206 14:02:29.353691 38 shared_informer.go:230] Caches are synced for node
I1206 14:02:29.353723 38 range_allocator.go:172] Starting range CIDR allocator
I1206 14:02:29.353727 38 shared_informer.go:223] Waiting for caches to sync for cidrallocator
I1206 14:02:29.353731 38 shared_informer.go:230] Caches are synced for cidrallocator
I1206 14:02:29.357886 38 shared_informer.go:230] Caches are synced for TTL
I1206 14:02:29.375084 38 shared_informer.go:230] Caches are synced for namespace
I1206 14:02:29.377548 38 shared_informer.go:230] Caches are synced for certificate-csrsigning
I1206 14:02:29.385322 38 shared_informer.go:230] Caches are synced for endpoint_slice
I1206 14:02:29.387341 38 shared_informer.go:230] Caches are synced for GC
I1206 14:02:29.389206 38 shared_informer.go:230] Caches are synced for deployment
I1206 14:02:29.395017 38 shared_informer.go:230] Caches are synced for PV protection
I1206 14:02:29.407208 38 shared_informer.go:230] Caches are synced for ReplicaSet
I1206 14:02:29.414114 38 shared_informer.go:230] Caches are synced for certificate-csrapproving
I1206 14:02:29.419590 38 shared_informer.go:230] Caches are synced for job
E1206 14:02:29.424622 38 kubelet.go:2268] node "local-node" not found
I1206 14:02:29.461912 38 shared_informer.go:230] Caches are synced for ReplicationController
I1206 14:02:29.483138 38 shared_informer.go:230] Caches are synced for endpoint
I1206 14:02:29.510171 38 shared_informer.go:230] Caches are synced for disruption
I1206 14:02:29.510237 38 disruption.go:339] Sending events to api server.
E1206 14:02:29.524771 38 kubelet.go:2268] node "local-node" not found
I1206 14:02:29.582355 38 log.go:172] http: TLS handshake error from 127.0.0.1:60144: remote error: tls: bad certificate
E1206 14:02:29.582374 7 leaderelection.go:321] error retrieving resource lock kube-system/cattle-controllers: Get "https://127.0.0.1:6443/api/v1/namespaces/kube-system/configmaps/cattle-controllers?timeout=15m0s": x509: certificate has expired or is not yet valid: current time 2021-12-06T14:02:29Z is after 2021-11-22T16:54:55Z
I1206 14:02:29.600870 38 log.go:172] http: TLS handshake error from 127.0.0.1:60148: remote error: tls: bad certificate
E1206 14:02:29.600902 38 controller.go:136] failed to ensure node lease exists, will retry in 7s, error: Get https://127.0.0.1:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/local-node?timeout=10s: x509: certificate has expired or is not yet valid
E1206 14:02:29.625587 38 kubelet.go:2268] node "local-node" not found
I1206 14:02:29.659975 38 shared_informer.go:230] Caches are synced for ClusterRoleAggregator
E1206 14:02:29.725745 38 kubelet.go:2268] node "local-node" not found
I1206 14:02:29.752689 38 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
I1206 14:02:29.760026 38 kubelet_node_status.go:70] Attempting to register node local-node
I1206 14:02:29.761301 38 log.go:172] http: TLS handshake error from 127.0.0.1:60166: remote error: tls: bad certificate
E1206 14:02:29.761345 38 kubelet_node_status.go:92] Unable to register node "local-node" with API server: Post https://127.0.0.1:6443/api/v1/nodes: x509: certificate has expired or is not yet valid
I1206 14:02:29.816976 38 shared_informer.go:230] Caches are synced for persistent volume
E1206 14:02:29.825875 38 kubelet.go:2268] node "local-node" not found
I1206 14:02:29.833350 38 shared_informer.go:230] Caches are synced for PVC protection
I1206 14:02:29.881269 38 shared_informer.go:230] Caches are synced for expand
I1206 14:02:29.891124 38 shared_informer.go:230] Caches are synced for stateful set
I1206 14:02:29.893302 38 shared_informer.go:230] Caches are synced for attach detach
I1206 14:02:29.911933 38 shared_informer.go:230] Caches are synced for taint
I1206 14:02:29.912011 38 node_lifecycle_controller.go:1433] Initializing eviction metric for zone:
W1206 14:02:29.912093 38 node_lifecycle_controller.go:1048] Missing timestamp for Node local-node. Assuming now as a timestamp.
I1206 14:02:29.912097 38 taint_manager.go:187] Starting NoExecuteTaintManager
I1206 14:02:29.912139 38 node_lifecycle_controller.go:1199] Controller detected that all Nodes are not-Ready. Entering master disruption mode.
I1206 14:02:29.912204 38 event.go:278] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"local-node", UID:"ddc88ab3-ad4e-49af-90c3-80c6609f2a45", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node local-node event: Registered Node local-node in Controller
E1206 14:02:29.925993 38 kubelet.go:2268] node "local-node" not found
I1206 14:02:29.955987 38 shared_informer.go:230] Caches are synced for daemon sets
I1206 14:02:29.980826 38 log.go:172] http: TLS handshake error from 127.0.0.1:60174: remote error: tls: bad certificate
E1206 14:02:30.026129 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:30.126272 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:30.226424 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:30.326551 38 kubelet.go:2268] node "local-node" not found
I1206 14:02:30.347250 38 request.go:621] Throttling request took 1.040385828s, request: GET:https://127.0.0.1:6444/apis/node.k8s.io/v1beta1?timeout=32s
E1206 14:02:30.426719 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:30.526860 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:30.627012 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:30.727209 38 kubelet.go:2268] node "local-node" not found
I1206 14:02:30.800123 38 shared_informer.go:223] Waiting for caches to sync for garbage collector
E1206 14:02:30.827346 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:30.927510 38 kubelet.go:2268] node "local-node" not found
I1206 14:02:30.973886 38 log.go:172] http: TLS handshake error from 127.0.0.1:60200: remote error: tls: bad certificate
time="2021-12-06T14:02:30.973951284Z" level=info msg="waiting for node local-node: Get https://127.0.0.1:6443/api/v1/nodes/local-node: x509: certificate has expired or is not yet valid"
I1206 14:02:30.980388 38 log.go:172] http: TLS handshake error from 127.0.0.1:60202: remote error: tls: bad certificate
E1206 14:02:31.027812 38 kubelet.go:2268] node "local-node" not found
I1206 14:02:31.127254 38 log.go:172] http: TLS handshake error from 127.0.0.1:60208: remote error: tls: bad certificate
E1206 14:02:31.127419 7 reflector.go:128] pkg/mod/github.com/rancher/client-go@v1.19.0-rancher.2/tools/cache/reflector.go:157: Failed to watch *summary.SummarizedObject: failed to list *summary.SummarizedObject: Get "https://127.0.0.1:6443/apis/apiregistration.k8s.io/v1/apiservices?limit=500&resourceVersion=0&timeout=15m0s": x509: certificate has expired or is not yet valid: current time 2021-12-06T14:02:31Z is after 2021-11-22T16:54:55Z
E1206 14:02:31.127947 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:31.228077 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:31.328215 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:31.428344 38 kubelet.go:2268] node "local-node" not found
I1206 14:02:31.470413 38 log.go:172] http: TLS handshake error from 127.0.0.1:60220: remote error: tls: bad certificate
E1206 14:02:31.470456 7 reflector.go:128] pkg/mod/github.com/rancher/client-go@v1.19.0-rancher.2/tools/cache/reflector.go:157: Failed to watch *summary.SummarizedObject: failed to list *summary.SummarizedObject: Get "https://127.0.0.1:6443/api/v1/podtemplates?resourceVersion=200290079&timeout=15m0s": x509: certificate has expired or is not yet valid: current time 2021-12-06T14:02:31Z is after 2021-11-22T16:54:55Z
E1206 14:02:31.528489 38 kubelet.go:2268] node "local-node" not found
I1206 14:02:31.582257 38 log.go:172] http: TLS handshake error from 127.0.0.1:60222: remote error: tls: bad certificate
E1206 14:02:31.582278 7 leaderelection.go:321] error retrieving resource lock kube-system/cattle-controllers: Get "https://127.0.0.1:6443/api/v1/namespaces/kube-system/configmaps/cattle-controllers?timeout=15m0s": x509: certificate has expired or is not yet valid: current time 2021-12-06T14:02:31Z is after 2021-11-22T16:54:55Z
E1206 14:02:31.628646 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:31.728823 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:31.828990 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:31.929138 38 kubelet.go:2268] node "local-node" not found
I1206 14:02:31.980692 38 log.go:172] http: TLS handshake error from 127.0.0.1:60232: remote error: tls: bad certificate
E1206 14:02:32.029318 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:32.129436 38 kubelet.go:2268] node "local-node" not found
I1206 14:02:32.215834 38 log.go:172] http: TLS handshake error from 127.0.0.1:60240: remote error: tls: bad certificate
E1206 14:02:32.229587 38 kubelet.go:2268] node "local-node" not found
I1206 14:02:32.245322 38 log.go:172] http: TLS handshake error from 127.0.0.1:60242: remote error: tls: bad certificate
E1206 14:02:32.245571 7 reflector.go:128] pkg/mod/github.com/rancher/client-go@v1.19.0-rancher.2/tools/cache/reflector.go:157: Failed to watch *v3.ClusterRoleTemplateBinding: failed to list *v3.ClusterRoleTemplateBinding: Get "https://127.0.0.1:6443/apis/management.cattle.io/v3/clusterroletemplatebindings?resourceVersion=200290094": x509: certificate has expired or is not yet valid: current time 2021-12-06T14:02:32Z is after 2021-11-22T16:54:55Z
E1206 14:02:32.329721 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:32.429869 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:32.530048 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:32.630198 38 kubelet.go:2268] node "local-node" not found
I1206 14:02:32.640085 38 shared_informer.go:230] Caches are synced for resource quota
I1206 14:02:32.700370 38 shared_informer.go:230] Caches are synced for garbage collector
I1206 14:02:32.704422 38 shared_informer.go:230] Caches are synced for garbage collector
I1206 14:02:32.704471 38 garbagecollector.go:142] Garbage collector: all resource monitors have synced. Proceeding to collect garbage
I1206 14:02:32.714892 38 shared_informer.go:230] Caches are synced for resource quota
I1206 14:02:32.723345 38 log.go:172] http: TLS handshake error from 127.0.0.1:60256: remote error: tls: bad certificate
E1206 14:02:32.723417 7 reflector.go:128] pkg/mod/github.com/rancher/client-go@v1.19.0-rancher.2/tools/cache/reflector.go:157: Failed to watch *v3.SourceCodeCredential: failed to list *v3.SourceCodeCredential: Get "https://127.0.0.1:6443/apis/project.cattle.io/v3/sourcecodecredentials?limit=500": x509: certificate has expired or is not yet valid: current time 2021-12-06T14:02:32Z is after 2021-11-22T16:54:55Z
E1206 14:02:32.730351 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:32.830488 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:32.930620 38 kubelet.go:2268] node "local-node" not found
I1206 14:02:32.950454 38 log.go:172] http: TLS handshake error from 127.0.0.1:60258: remote error: tls: bad certificate
time="2021-12-06T14:02:32.950484448Z" level=error msg="Unable to watch for tunnel endpoints: Get https://127.0.0.1:6443/api/v1/namespaces/default/endpoints?fieldSelector=metadata.name%3Dkubernetes&resourceVersion=0&watch=true: x509: certificate has expired or is not yet valid"
I1206 14:02:32.981698 38 log.go:172] http: TLS handshake error from 127.0.0.1:60260: remote error: tls: bad certificate
I1206 14:02:32.982220 38 log.go:172] http: TLS handshake error from 127.0.0.1:60262: remote error: tls: bad certificate
time="2021-12-06T14:02:32.982259189Z" level=info msg="waiting for node local-node: Get https://127.0.0.1:6443/api/v1/nodes/local-node: x509: certificate has expired or is not yet valid"
E1206 14:02:33.016995 38 eviction_manager.go:260] eviction manager: failed to get summary stats: failed to get node info: node "local-node" not found
E1206 14:02:33.030759 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:33.130907 38 kubelet.go:2268] node "local-node" not found
I1206 14:02:33.139800 38 log.go:172] http: TLS handshake error from 127.0.0.1:60264: remote error: tls: bad certificate
E1206 14:02:33.139922 7 reflector.go:128] pkg/mod/github.com/rancher/client-go@v1.19.0-rancher.2/tools/cache/reflector.go:157: Failed to watch *summary.SummarizedObject: failed to list *summary.SummarizedObject: Get "https://127.0.0.1:6443/apis/monitoring.coreos.com/v1/prometheuses?limit=500&resourceVersion=0&timeout=15m0s": x509: certificate has expired or is not yet valid: current time 2021-12-06T14:02:33Z is after 2021-11-22T16:54:55Z
E1206 14:02:33.231098 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:33.331303 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:33.431495 38 kubelet.go:2268] node "local-node" not found
E1206 14:02:33.531726 38 kubelet.go:2268] node "local-node" not found
I1206 14:02:33.579713 7 leaderelection.go:278] failed to renew lease kube-system/cattle-controllers: timed out waiting for the condition
E1206 14:02:33.579824 7 leaderelection.go:297] Failed to release lock: resource name may not be empty
2021/12/06 14:02:33 [FATAL] leaderelection lost for cattle-controllers
Looking at this, I understand it is a certificate issue that is causing the container to fail, but I’m not understanding which certificate is failing and how to regen it without breaking the working clusters.
I’ve tried already to run the following and restart and it didn’t solve:
sudo docker exec -it 261d4c9fee6e sh -c "mv /var/lib/rancher/k3s/server/tls/dynamic-cert.json /var/lib/rancher/k3s/server/tls/dynamic-cert.json.v2"
sudo docker restart 261d4c9fee6e
Anyone knows how this could be fixed?
Thanks