I’ve been attempting to run Rancher in a single-node Docker deployment. Everything seems fine until creating the first cluster, at that point, the UI remains in the “provisioning” state with the following error: [etcd] Failed to bring up Etcd Plane: etcd cluster is unhealthy: hosts [192.168.2.254] failed to report healthy…".
I have done some research on this issue, which is commonly caused when a node is re-used and the certificates are not properly cleaned up; however, for me this occurs on a fresh install of the OS, including deleting and repartitioning the file systems. I assume there is something else in my environment that is causing this, but I’ve been at this on and off for a month with the exact same results. Every attempt to create a cluster has failed.
Etcd container is logging:
2022-02-10 17:54:49.448020 I | embed: rejected connection from "192.168.2.254:37626" (error "tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kube-ca\")", ServerName "")
Rancher server is logging:
2022/02/10 17:56:14 [WARNING] [etcd] host [192.168.2.254] failed to check etcd health: failed to get /health for host [192.168.2.254]: Get https://192.168.2.254:2379/health: remote error: tls: bad certificate
2022/02/10 17:56:14 [ERROR] cluster [c-t45b7] provisioning: [etcd] Failed to bring up Etcd Plane: etcd cluster is unhealthy: hosts [192.168.2.254] failed to report healthy. Check etcd container logs on each host for more information
Environment:
- Hardware: Intel i7-6700K, 4 CPU (8 VCPU), 32 GB RAM, 2 TB HDD, 220 GB SSD (bare metal).
- CentOS Linux release 7.9.2009, minimal install, NTP (chronyd), UTC timezone
- OS tuning: firewalld disabled, selinux disabled, swap disabled, br_netfilter loaded, net.bridge.bridge-nf-call.iptables=1
- Host name: pc-mpi00482 (no domain, no DNS entry – I did try ad DNS server in a previous attempt, it did not help)
- Docker 20.10.7 installed via https://releases.rancher.com/install-docker/20.10.sh (with current user added to the
docker
group) - Rancher: v2.6.3, installed via
docker run -d --restart=unless-stopped -p 80:80 -p 443:443 --privileged rancher/rancher:v2.6.3
- Kubernetes v1.21.9-rancher-1-1 cluster, created via the Rancher UI…
docker run -d --privileged --restart=unless-stopped --net=host \
-v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run \
rancher/rancher-agent:v2.6.3 --server https://192.168.2.254 \
--token svrmr76bx5lwdkm6654gsdsjqngq6mcbsq7nnsvqz9c85sp59kzlmt \
--ca-checksum a4f6526a9dc51f94ace3217b3c379ca1de12462d06693df4cac22108d7c00766 \
--etcd --controlplane --worker
All certificates are…
/etc/kubernetes/ssl/:
-rw-------. 1 root root 1675 Feb 10 17:50 kube-apiserver-key.pem
-rw-------. 1 root root 1306 Feb 10 17:50 kube-apiserver.pem
-rw-------. 1 root root 1675 Feb 10 17:50 kube-apiserver-proxy-client-key.pem
-rw-------. 1 root root 1151 Feb 10 17:50 kube-apiserver-proxy-client.pem
-rw-------. 1 root root 1675 Feb 10 17:50 kube-apiserver-requestheader-ca-key.pem
-rw-------. 1 root root 1123 Feb 10 17:50 kube-apiserver-requestheader-ca.pem
-rw-------. 1 root root 1679 Feb 10 17:50 kube-ca-key.pem
-rw-------. 1 root root 1058 Feb 10 17:50 kube-ca.pem
-rw-------. 1 root root 517 Feb 10 17:50 kubecfg-kube-apiserver-proxy-client.yaml
-rw-------. 1 root root 533 Feb 10 17:50 kubecfg-kube-apiserver-requestheader-ca.yaml
-rw-------. 1 root root 501 Feb 10 17:50 kubecfg-kube-controller-manager.yaml
-rw-------. 1 root root 445 Feb 10 17:50 kubecfg-kube-node.yaml
-rw-------. 1 root root 449 Feb 10 17:50 kubecfg-kube-proxy.yaml
-rw-------. 1 root root 465 Feb 10 17:50 kubecfg-kube-scheduler.yaml
-rw-------. 1 root root 1675 Feb 10 17:50 kube-controller-manager-key.pem
-rw-------. 1 root root 1107 Feb 10 17:50 kube-controller-manager.pem
-rw-------. 1 root root 1679 Feb 10 17:50 kube-etcd-192-168-2-254-key.pem
-rw-------. 1 root root 1298 Feb 10 17:50 kube-etcd-192-168-2-254.pem
-rw-------. 1 root root 1675 Feb 10 17:50 kube-node-key.pem
-rw-------. 1 root root 1115 Feb 10 17:50 kube-node.pem
-rw-------. 1 root root 1675 Feb 10 17:50 kube-proxy-key.pem
-rw-------. 1 root root 1090 Feb 10 17:50 kube-proxy.pem
-rw-------. 1 root root 1675 Feb 10 17:50 kube-scheduler-key.pem
-rw-------. 1 root root 1094 Feb 10 17:50 kube-scheduler.pem
-rw-------. 1 root root 1675 Feb 10 17:50 kube-service-account-token-key.pem
-rw-------. 1 root root 1277 Feb 10 17:50 kube-service-account-token.pem
/etc/kubernetes/ssl/certs:
-rw-------. 1 root root 635 Feb 10 17:49 serverca
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
88cb87e634f1 rancher/mirrored-coreos-etcd:v3.4.16-rancher1 "/usr/local/bin/etcd…" 5 hours ago Up 52 minutes etcd
359d32e78ce7 rancher/rancher-agent:v2.6.3 "run.sh --server htt…" 5 hours ago Up 5 hours upbeat_chandrasekhar
8416ce813921 rancher/rancher:v2.6.3 "entrypoint.sh" 6 hours ago Up 5 hours 0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp happy_ptolemy
Note: etcd is not running in privileged mode, while the Rancher server and agent are.
Also, running https://raw.githubusercontent.com/docker/docker/master/contrib/check-config.sh identified the following issues, but it’s not clear to me if they are significant:
- warning: /proc/config.gz does not exist, searching other paths for kernel config …
- (RHEL7/CentOS7: User namespaces disabled; add ‘user_namespace.enable=1’ to boot command line)
- CONFIG_RESOURCE_COUNTERS: missing
- CONFIG_SECURITY_APPARMOR: missing
- CONFIG_EXT3_FS: missing
- CONFIG_EXT3_FS_XATTR: missing
- CONFIG_EXT3_FS_POSIX_ACL: missing
- CONFIG_EXT3_FS_SECURITY: missing
- CONFIG_IPVLAN: missing
- CONFIG_AUFS_FS: missing
- /dev/zfs: missing
- zfs command: missing
- zpool command: missing
Any help is appreciated.
Thanks.