Since the upgrade to Rancher v2.1.1, I have been unable to successfully deploy a cluster on the first attempt.
Each time, the node VM’s that get brought up may or may not get an IP address via DHCP. A reboot of the server doesn’t help. You have power down and back up the server to get an IP.
Even if they do get an IP, Rancher doesn’t successfully deploy everything to the nodes, leaving the cluster down.
Seeing errors such as: [network] Host [10.85.175.29] is not able to connect to the following ports: [10.85.175.36:2379]. Please check network policies and firewall rules
Or when looking at the Etcd logs:
rafthttp: request cluster ID mismatch (got c8ba020dc536a627 want 6d31c0a3ea602366)
Shouldn’t have a cluster ID mismatch when these are brand new VM’s created by Rancher’s cluster creation process using Rancher OS.
Rancher Agent logs:
Failed to connect to proxy. web socket bad handshake
Error: failed to start containers: Kubelet
- sleep 2
- docker start Kubelet
Error: response from daemon: {“message”:“No such container: kubelet”}
It’s all been wildly inconsistent and leaving me unable to bring up new clusters.