vSphere provisioning problems using Rancher in HA


#1

I’ve been trying to automatically provision clusters using vSphere and I have gotten it to work exactly like expected when using Single Node installation of Rancher. But when trying to provision from a Multi Node installation of Rancher I get TLS handshake errors, does anyone have any ideas?

Single Node installation (WORKING)
If I create a virtual machine in VMWare with Ubuntu 18.04.1 LTS and install latest docker, then run:

sudo docker run -d --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher:stable

And take it from there, setup a Node Template for vSphere and start provisioning a cluster, everything works perfectly.

Multi Node installation (NOT WORKING)
If I go through all the steps in the High Availability (HA) Install guide found here: https://rancher.com/docs/rancher/v2.x/en/installation/ha/

Using cert-manager to generate Rancher Generated Certificates.

Add a Node Template for vSphere (same as single node) and start provisioning a cluster, everything works to the point where it’s starting health checks for kube-apiserver.

The first error I get is:
[controlPlane] Failed to bring up Control Plane: Failed to verify healthcheck: Failed to check https://localhost:6443/healthz for service [kube-apiserver] on host [172.16.65.232]: Get https://localhost:6443/healthz: can not build dialer to c-pjwt7:m-2r5ht, log: I0201 14:21:51.033261 1 storage_rbac.go:276] created rolebinding.rbac.authorization.k8s.io/system:controller:bootstrap-signer in kube-public

After that it retries and the next error I get is:
[controlPlane] Failed to bring up Control Plane: Failed to verify healthcheck: Failed to check https://localhost:6443/healthz for service [kube-apiserver] on host [172.16.65.232]: Get https://localhost:6443/healthz: can not build dialer to c-pjwt7:m-2r5ht, log: I0201 14:26:17.565248 1 log.go:172] http: TLS handshake error from 172.16.65.232:33397: EOF

I can’t seem to find out what the difference between the two setups are, could someone please help me find out what I’m supposed to do to make this work.

Thank you.


#2

Never mind.

My problem was that it couldn’t reach the Rancher Cluster that was trying to provision machines.
In my case, I had to add an entry to /etc/hosts to point to my external rancher URL. Then it all worked!