I am unable to create a new cluster. Every time I try, I get different error messages.
Rancher version: 2.1.4
All three nodes are ubuntu 16.10 VMs, up to date with all latest packages.
I decided to “clean slate” everything:
‘docker stop’ everything on each node.
‘docker system prune -a’ on each node.
On rancher, I deleted each node and the cluster.
I rebuilt the cluster:
On rancher, created new cluster
“custom” node type
“Canal” network provider
“custom” cloud provider
I chose each node to serve all three roles: etcd, control plane, and worker
I cut and paste the docker run command into each VM
Rancher reports three nodes, but I am getting this error:
This cluster is currently Provisioning; areas that interact directly with it will not be available until the API is ready.
[network] Host [10.10.55.223] is not able to connect to the following ports: [10.10.55.224:2379, 10.10.55.224:2380]. Please check network policies and firewall rules
I have deleted everything and tried this a few times, all end with error. What am I doing wrong?
Frustrated, I decided to completely start over with all new installs:
I built a new rancher server (2.1.15 - saw some cluster bugs were fixed)
I built 3 new Ubuntu 16.04 server VMs. Installed docker (v 17.03.2)
I built a new cluster in Rancher using “Custom” nodes. Each node serves all three roles.
I cut and paste the rancher provisioning script into each terminal and left.
6 hours later, the new cluster was in error state: [controlPlane] Failed to bring up Control Plane: Failed to verify healthcheck: Failed to check https://localhost:6443/healthz for service [kube-apiserver] on host [10.10.55.222]: Get https://localhost:6443/healthz: read tcp [::1]:43414->[::1]:6443: read: connection reset by peer, log: I0129 15:07:03.469022 1 plugins.go:161] Loaded 6 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,ResourceQuota.
I rebooted each node and waited. I saw the messages in rancher cycle through the standard cluster provisioning messages.
Re-using RKE provisioned K8S nodes is not as simple as just stopping all the containers. I found this script from superseb that cleaned up the nodes and made them work again: