Install v2.2.2 on vsphere crashing rancher docker container

TimothyMeade · May 12, 2019, 7:37pm

We have been running rancher for about 6 months.

Recently updated to latest docker rancher image and it will no longer install on vsphere 6.5. Everything we have tried has ended up with the rancher docker image crashing.

docker logs -f show the spot. It’s the same spot every time:

2019/05/12 19:30:50 [INFO] (k8s-fm-20190512-master3) Waiting for VMware Tools to come online…
2019/05/12 19:30:57 [INFO] stdout: (k8s-fm-20190512-worker3) adding network: Private Network
2019/05/12 19:30:57 [INFO] (k8s-fm-20190512-worker3) adding network: Private Network
2019/05/12 19:30:58 [INFO] stdout: (k8s-fm-20190512-worker2) adding network: Private Network
2019/05/12 19:30:58 [INFO] (k8s-fm-20190512-worker2) adding network: Private Network
2019/05/12 19:30:59 [INFO] stdout: (k8s-fm-20190512-master1) adding network: Private Network
2019/05/12 19:30:59 [INFO] (k8s-fm-20190512-master1) adding network: Private Network
2019-05-12 19:31:00.503793 W | wal: sync duration of 2.715061271s, expected less than 1s
2019-05-12 19:31:01.795896 W | etcdserver: apply entries took too long [1.291324389s for 6 entries]
2019-05-12 19:31:01.795959 W | etcdserver: avoid queries with large range/delete range!
I0512 19:31:01.797276 6 trace.go:76] Trace[1764331517]: “GuaranteedUpdate etcd3: *unstructured.Unstructured” (started: 2019-05-12 19:30:59.8600067 +0000 UTC m=+229.067950235) (total time: 1.937155923s):
Trace[1764331517]: [1.936691888s] [1.935787s] Transaction committed
I0512 19:31:01.797279 6 trace.go:76] Trace[428383260]: “GuaranteedUpdate etcd3: *unstructured.Unstructured” (started: 2019-05-12 19:30:57.884785944 +0000 UTC m=+227.092729469) (total time: 3.912399887s):
Trace[428383260]: [3.91120186s] [3.91003126s] Transaction committed
I0512 19:31:01.797799 6 trace.go:76] Trace[1726044098]: “Update /apis/management.cattle.io/v3/namespaces/c-2m7xj/nodes/m-qmssh” (started: 2019-05-12 19:30:59.859130557 +0000 UTC m=+229.067074078) (total time: 1.938621682s):
Trace[1726044098]: [1.938206584s] [1.937449922s] Object stored in database
I0512 19:31:01.798475 6 trace.go:76] Trace[946890212]: “Update /apis/management.cattle.io/v3/namespaces/c-2m7xj/nodes/m-n9tkl” (started: 2019-05-12 19:30:57.883437464+0000 UTC m=+227.091380919) (total time: 3.91495619s):
Trace[946890212]: [3.914357412s] [3.913165632s] Object stored in database
I0512 19:31:01.799254 6 trace.go:76] Trace[504662872]: “GuaranteedUpdate etcd3: *unstructured.Unstructured” (started: 2019-05-12 19:30:58.545715951 +0000 UTC m=+227.753659460) (total time: 3.253454418s):
Trace[504662872]: [3.250158453s] [3.247995087s] Transaction committed
I0512 19:31:01.799817 6 trace.go:76] Trace[2110925815]: “Update /apis/management.cattle.io/v3/namespaces/c-2m7xj/nodes/m-s5jmr” (started: 2019-05-12 19:30:58.543620243 +0000 UTC m=+227.751563798) (total time: 3.256156852s):
Trace[2110925815]: [3.255670122s] [3.253850816s] Object stored in database
2019/05/12 19:31:01 [INFO] stdout: (k8s-fm-20190512-master1) Reconfiguring VM
2019/05/12 19:31:01 [INFO] (k8s-fm-20190512-master1) Reconfiguring VM
2019/05/12 19:31:01 [INFO] stdout: (k8s-fm-20190512-worker3) Reconfiguring VM
2019/05/12 19:31:01 [INFO] (k8s-fm-20190512-worker3) Reconfiguring VM
2019/05/12 19:31:01 [INFO] stdout: (k8s-fm-20190512-worker2) Reconfiguring VM
2019/05/12 19:31:01 [INFO] (k8s-fm-20190512-worker2) Reconfiguring VM
2019/05/12 19:31:02 [INFO] stdout: (k8s-fm-20190512-master2) adding network: Private Network
2019/05/12 19:31:02 [INFO] (k8s-fm-20190512-master2) adding network: Private Network
I0512 19:31:08.037062 6 leaderelection.go:231] failed to renew lease kube-system/kube-scheduler: failed to tryAcquireOrRenew context deadline exceeded
E0512 19:31:08.037167 6 server.go:207] lost master
lost lease

once the docker image restarts it tries a few times and then deletes the VMs in vsphere and crashes at the same spot. Restarts and does the same thing until i delete the cluster.

rancher/rancher:latest and rancher/rancher:v2.2.2-patch1-rc2 behave the same.

Any ideas? Anything i should try?

Thanks

Tim

TimothyMeade · May 16, 2019, 10:29am

I have installed back to v2.2.0 and the :latest from yesterday. i have setup a completely new VM and setup TWICE. This occurs every time. I’ve read it has something to do with the TLS certs. We are literally stuck here as we cannot deploy ANYTHING now. Is this even being worked on? I’ve seen quite a few posts here and on github.

Is there anyone I can contact on this? We are being forced to look for alternatives now as everything is dead in the water on creating new clusters on vsphere.

TimothyMeade · May 18, 2019, 9:27pm

Im providing an update for this. We have gotten this to work and i wanted to document for others as i’ve seen several with similar issues.

make CERTAIN that your hosts file is correct on your server running the rancher docker file.

127.0.0.1 localhost is a must
We also had an incorrect self ip and hostname in the file. I personally believe this was the issue the entire time.

so in your /etc/hosts file
1.1.1.1 myhostname

Make sure that your ip for the server and the correct name is there. I guess fqdn reverse dns might solve also. In our case the ip was correct but the name was not. I think this was causing the TLS issues.

I am available to discuss if needed .

Topic		Replies	Views
Etcd keeps crashing Rancher 2.0 Tech Preview	3	2484	April 17, 2018
Rancher 2.x fails to connect to Kubernetes on VSphere - Fails on port listener Rancher 2.x	0	1565	May 22, 2018
Rancher server restart now and then Rancher 2.x	0	2041	February 25, 2019
New Single Node Docker install fails Rancher 2.6 Rancher 2.x	7	9582	March 3, 2022
Rancher2 vSphere Provisioning 403 error Rancher 2.x	1	1243	September 3, 2019

Install v2.2.2 on vsphere crashing rancher docker container

Related Topics