Vsphere Provisioning - "Stuck at SSH Step"

Using Rancher 2.0.0 I am trying to build nodes through vSphere node driver.

I am getting the VM created with Rancher OS latest, however I get stuck on rancher “Waiting for SSH to be Available” It then after several attempts deletes the VM and recreates it.

If rancher just created the VM why would it not have access to the VM to SSH? Is there a step I am missing? I’d really love to be able to provision nodes through Rancher UI, however I am stuck at this point.

Thanks in advance.
Garrett

1 Like

Any help from the rancher team on this so we can properly kick the tires would be appreciated.

Kind Regards

1 Like

What ISO are you booting? Is the network it is attached to based on DHCP? Does it get an IP address? Is the host running the rancher/rancher container able to SSH to that IP (you can manually verify while the host is still up). Does the ISO fully boot (check VMware console)?

Hey @superseb , thank you for the reply.

I can’t speak for @Gneisler, but I am booting from the default ISO. Which from the settings looks like a boot2docker iso. The network is statically defined for me. So NO DHCP. If you could tell me how to pass a statically defined IP address to the node through the use of Rancher, i would greatly appreciate that.

The ISO does fully boot, but the eth0 address does not have an IP and therefore I can’t reach it from the rancher/rancher container via ssh.

Do you have any other questions or thoughts on your end?

Kind Regards,
Jock

@superseb
I am running Rancher 1.3.0, DHCP setup and my VM does come up and obtains an IP. 10.10.10.170 in this case and it fully boots.

I am able to get to login on SSH (Putty), however I dont know the password and I think that is part of the problem that Rancher can’t SSH into the newly created box either.

Yes, my Rancher Host (10.10.10.100) Can SSH into 10.10.10.170, however password is still unknown.

Here is my logs from the node creation (Note, I do have another Custom Cluster running) :

2018-05-21 19:51:24.323176 I | mvcc: store.index: compact 2808835
2018-05-21 19:51:24.452858 I | mvcc: finished scheduled compaction at 2808835 (took 114.099566ms)
2018-05-21 19:56:24.344569 I | mvcc: store.index: compact 2809313
2018-05-21 19:56:24.476282 I | mvcc: finished scheduled compaction at 2809313 (took 115.453652ms)
2018-05-21 20:01:24.371517 I | mvcc: store.index: compact 2809793
2018-05-21 20:01:24.501974 I | mvcc: finished scheduled compaction at 2809793 (took 114.967135ms)
2018-05-21 20:06:24.393680 I | mvcc: store.index: compact 2810272
2018-05-21 20:06:24.526386 I | mvcc: finished scheduled compaction at 2810272 (took 115.516733ms)
E0521 20:08:39.056161 1 streamwatcher.go:109] Unable to decode an event from the watch stream: json: cannot unmarshal string into Go struct field dynamicEvent.Object of type v3.NodeStatus
2018-05-21 20:11:24.423893 I | mvcc: store.index: compact 2810749
2018-05-21 20:11:24.554361 I | mvcc: finished scheduled compaction at 2810749 (took 113.94832ms)
2018-05-21 20:16:24.446828 I | mvcc: store.index: compact 2811227
2018-05-21 20:16:24.580335 I | mvcc: finished scheduled compaction at 2811227 (took 117.357646ms)
2018-05-21 20:21:24.470289 I | mvcc: store.index: compact 2811708
2018-05-21 20:21:24.603625 I | mvcc: finished scheduled compaction at 2811708 (took 116.863508ms)
2018-05-21 20:26:24.493156 I | mvcc: store.index: compact 2812186
2018-05-21 20:26:24.627190 I | mvcc: finished scheduled compaction at 2812186 (took 116.671218ms)
E0521 20:29:21.065702 1 streamwatcher.go:109] Unable to decode an event from the watch stream: json: cannot unmarshal string into Go struct field dynamicEvent.Object of type v3.NodeStatus
2018-05-21 20:31:24.517165 I | mvcc: store.index: compact 2812664
2018-05-21 20:31:24.649320 I | mvcc: finished scheduled compaction at 2812664 (took 116.825798ms)
2018-05-21 20:33:21.362663 W | etcdserver: apply entries took too long [117.338042ms for 1 entries]
2018-05-21 20:33:21.362710 W | etcdserver: avoid queries with large range/delete range!
2018-05-21 20:36:24.538559 I | mvcc: store.index: compact 2813143
2018-05-21 20:36:24.673483 I | mvcc: finished scheduled compaction at 2813143 (took 118.002293ms)
2018-05-21 20:41:24.561623 I | mvcc: store.index: compact 2813623
2018-05-21 20:41:24.691912 I | mvcc: finished scheduled compaction at 2813623 (took 114.756605ms)
2018/05/21 20:46:07 [INFO] Running cluster events cleanup
2018/05/21 20:46:07 [INFO] Done running cluster events cleanup
2018-05-21 20:46:24.590199 I | mvcc: store.index: compact 2814101
2018-05-21 20:46:24.724396 I | mvcc: finished scheduled compaction at 2814101 (took 117.250385ms)
2018-05-21 20:51:24.613519 I | mvcc: store.index: compact 2814583
2018-05-21 20:51:24.745679 I | mvcc: finished scheduled compaction at 2814583 (took 116.752901ms)
E0521 20:53:51.075275 1 streamwatcher.go:109] Unable to decode an event from the watch stream: json: cannot unmarshal string into Go struct field dynamicEvent.Object of type v3.NodeStatus
2018-05-21 20:56:24.635070 I | mvcc: store.index: compact 2815062
2018-05-21 20:56:24.769395 I | mvcc: finished scheduled compaction at 2815062 (took 119.229249ms)
2018/05/21 20:57:37 [INFO] stdout: Creating CA: management-state/node/nodes/test-master1/certs/ca.pem
2018/05/21 20:57:38 [INFO] stdout: Creating client certificate: management-state/node/nodes/test-master1/certs/cert.pem
2018/05/21 20:57:38 [INFO] stdout: Running pre-create checks…
2018/05/21 20:57:43 [INFO] stdout: Creating machine…
2018/05/21 20:57:43 [INFO] stdout: (test-master1) Image cache directory does not exist, creating it at management-state/node/nodes/test-master1/cache…
2018/05/21 20:57:43 [INFO] stdout: (test-master1) Downloading management-state/node/nodes/test-master1/cache/boot2docker.iso from https://releases.rancher.com/os/latest/rancheros.iso
2018/05/21 20:57:49 [INFO] stdout: (test-master1) 0%!.(MISSING)…10%!.(MISSING)…20%!.(MISSING)…30%!.(MISSING)…40%!.(MISSING)…50%!.(MISSING)…60%!.(MISSING)…70%!.(MISSING)…80%!.(MISSING)…90%!.(MISSING)…100%!(NOVERB)
2018/05/21 20:57:49 [INFO] stdout: (test-master1) Generating SSH Keypair…
2018/05/21 20:57:55 [INFO] stdout: (test-master1) Creating VM…
2018/05/21 20:58:04 [INFO] stdout: (test-master1) Uploading Boot2docker ISO …
2018/05/21 20:58:11 [INFO] stdout: (test-master1) adding network: Data Network V101
2018/05/21 20:58:11 [INFO] stdout: (test-master1) Reconfiguring VM
2018/05/21 20:58:37 [INFO] stdout: (test-master1) Waiting for VMware Tools to come online…
2018/05/21 21:00:08 [INFO] stdout: (test-master1) Provisioning certs and ssh keys…
2018/05/21 21:00:09 [INFO] stdout: Waiting for machine to be running, this may take a few minutes…
2018/05/21 21:00:11 [INFO] stdout: Detecting operating system of created instance…
2018/05/21 21:00:11 [INFO] stdout: Waiting for SSH to be available…
2018-05-21 21:01:24.664589 I | mvcc: store.index: compact 2815543
2018-05-21 21:01:24.804049 I | mvcc: finished scheduled compaction at 2815543 (took 117.123078ms)
2018-05-21 21:06:24.695000 I | mvcc: store.index: compact 2816122
2018-05-21 21:06:24.845629 I | mvcc: finished scheduled compaction at 2816122 (took 134.439012ms)
2018/05/21 21:08:04 [INFO] stdout: The default lines below are for a sh/bash shell, you can specify the shell you’re using, with the --shell flag.
2018/05/21 21:08:04 [INFO] stdout:
2018/05/21 21:08:04 [INFO] Generating and uploading node config test-master1
E0521 21:08:04.877775 1 generic_controller.go:204] NodeController c-td7ld/m-bz9qd [node-controller] failed with : Error creating machine: Error detecting OS: Too many retries waiting for SSH to be available. Last error: Maximum number of retries (60) exceeded
2018/05/21 21:08:04 [INFO] NodeController c-td7ld/m-bz9qd completed with dropped err: [node-controller] failed with : Error
2018/05/21 21:08:05 [INFO] NodeController c-td7ld/m-bz9qd completed with dropped err: [node-controller] failed with : Error
E0521 21:09:39.082617 1 streamwatcher.go:109] Unable to decode an event from the watch stream: json: cannot unmarshal string into Go struct field dynamicEvent.Object of type v3.NodeStatus
2018/05/21 21:10:24 [INFO] stdout: Creating CA: management-state/node/nodes/test-master1/certs/ca.pem
2018/05/21 21:10:25 [INFO] stdout: Creating client certificate: management-state/node/nodes/test-master1/certs/cert.pem
2018/05/21 21:10:26 [INFO] stdout: Running pre-create checks…
2018/05/21 21:10:30 [INFO] stdout: Creating machine…
2018/05/21 21:10:30 [INFO] stdout: (test-master1) Image cache directory does not exist, creating it at management-state/node/nodes/test-master1/cache…
2018/05/21 21:10:30 [INFO] stdout: (test-master1) Downloading management-state/node/nodes/test-master1/cache/boot2docker.iso from https://releases.rancher.com/os/latest/rancheros.iso
2018/05/21 21:10:36 [INFO] stdout: (test-master1) 0%!.(MISSING)…10%!.(MISSING)…20%!.(MISSING)…30%!.(MISSING)…40%!.(MISSING)…50%!.(MISSING)…60%!.(MISSING)…70%!.(MISSING)…80%!.(MISSING)…90%!.(MISSING)…100%!(NOVERB)
2018/05/21 21:10:36 [INFO] stdout: (test-master1) Generating SSH Keypair…
2018/05/21 21:10:42 [INFO] stdout: (test-master1) Creating VM…
2018/05/21 21:10:55 [INFO] stdout: (test-master1) Uploading Boot2docker ISO …
2018/05/21 21:11:02 [INFO] stdout: (test-master1) adding network: Data Network V101
2018/05/21 21:11:02 [INFO] stdout: (test-master1) Reconfiguring VM
2018-05-21 21:11:24.722537 I | mvcc: store.index: compact 2816600
2018-05-21 21:11:24.852673 I | mvcc: finished scheduled compaction at 2816600 (took 114.539928ms)
2018/05/21 21:11:39 [INFO] stdout: (test-master1) Waiting for VMware Tools to come online…

Did you specify rancher as SSH User?

I have downloaded the rsa key from the created node. Converted this to a ppk file and attempted to ssh using putty using the newly created ppk file, logging in as rancher. Key is rejected.

My problem with this is Rancher itself can’t SSH to a node it creates…which means I am missing some configuration to specify the SSH password?!

What is the default password? What am I missing?

I think this is relevant

I attempted Rancher OS v1.4.0_rc2 and got past the SSH step. I did hit another error, however I will wait until 1.4.0 is out of RC to attempt again.

Consider this closed, thank you for everyones help.

i have same problem too, it is vsphere 6.5 with rancher 2.0. it shows in the rancher2 console that cannot bring up etcd cluster, waiting for ssh. i can login to the node OS from console, it is boot2docker os, but etcd container has no ‘4001’ listening, if i ran etcdctl member list, it shows me not able to connect to localhost:4001.

any help will be very appreciated!

Just would like to say thank you to the rancher team for releasing 1.4.0 Rancher OS. I am now able to spin up nodes through vmware just with a click of the button.

Thanks you!

I have hit this exact same issue. Is there a work around? rancher v2.2.2 . Should we revert? What does work? We need to spin up 2 clusters soonest.

Thanks