Etcd - error "tls: failed to verify client's certificate: x509

Hello,

I am newbie in rancher. I installed rancher/rancher:stable (version 2.6.3) - its ok but when I add new cluster → Custom, check etcd, worker, controlplane, copy generated command and run on the other server named “app”.

In Rancher GUI get following error:

[etcd] Failed to bring up Etcd Plane: etcd cluster is unhealthy: hosts [x.x.x.x] failed to report healthy. Check etcd container logs on each host for more information

When I check log of etcd container in app server I see following:

2022-03-15 16:46:59.647411 I | embed: rejected connection from "139.59.190.147:50472" (error "tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kube-ca\")", ServerName "")

Unfortunately I have no idea what the problem is.

Can somebody help please?

Are you adding a new node to the cluster? If so, the node (linux machine or VM) has been installed with kubernetes before. Then the problem is that there might be remnants of the config.

Check the following directory

  1. /etc/kubernetes (this is where all the certs are stored)
  2. /var/lib/etcd (where the etcd DB is located)

If this is a new node then you can clear the contents of the directory. My assumptions is that kubernetes is installed via RKE (I have only worked with installing kubernetes with RKE).

My solution in Digital Ocean cloud

  • Destroy droplet
  • Create new droplet
  • Run rancher/agent again

Now it works