Installing RancherOS under VMware

I’m trying to install under VMware, but not having much luck.

I have booted a new VMWare guest from the RancherOS 0.4.0 ISO image, and logged in over the network.

From the command line, verified that I have Internet connectivity (I can do an DNS lookup and ping an external host)

Create a config.yaml with the necessary key.

Run “ros install -c config.yaml -d /dev/sda”

INFO[0000] No install type specified…defaulting to generic
Installing from rancher/os:v0.4.0
Continue [y/N]: y
Unable to find image ‘rancher/os:v0.4.0’ locally
Pulling repository docker.io/rancher/os
Network timed out while trying to connect to https://index.docker.io/v1/repositories/rancher/os/images. You may want to check your internet connection or if you are behind a proxy.
FATA[0026] Failed to run install err=exit status 1

Now, the strange thing is that from the commandline I can retrieve that URL:

[root@rancher rancher]# wget https://index.docker.io/v1/repositories/rancher/os/images
Connecting to index.docker.io (52.7.162.45:443)
images 100% |*******************************| 7084 0:00:00 ETA

So, why would the ros install get this error? Incidentally it also happens using the 0.3 ISO image.

On a separate note, it would be great if RancherOS was supplied in a .ovf VMWare template…

@sshipway Looks like you’re having trouble connecting to index.docker.io port 443 from inside your VM.

Try this command having booted your VM from RancherOS install iso or another Linux live CD image that has openssl:

openssl s_client -connect index.docker.io:443 <<< ''

Its output should begin with CONNECTED(00000003) followed by various TLS/SSL connection info. Otherwise, you won’t be able to pull docker images on any guest OS.

Using the above openssl command, the connection completes without issue. Similarly (as I mentioned above) using wget to this URL also works.

The problem is that the install still fails with the network timeout. I’m guessing that the error is something else but the install procedure is reporting the wrong problem for some reason.

Run “ros install -c config.yaml -d /dev/sda”

Did you try “sudo ros …”?
Installed 0.4 without problems a short time ago.

Yes, I was using sudo.

The problem seems to be in accessing the registry for some reason. Possibly it could be caused by the SSL certificate validation stage, if the root cert is not in the CA? There is no option to ‘ros install’ to skip the SSL cert verification step.

@sshipway Are you still having the same problem? I mean, I used to have connectivity problems with the Docker registry myself. Maybe, because of its maintenance or something.

Please write back if the problem is still there. If that’s the case, I’d really want to get to the root cause.

Also, what VMware software are you using and what version?

I’ve only experienced a problem installing RancherOS on VMware Fusion 8 when I forgot to add memory to the VM. 1024 MB works fine.

@imikushin - it is still happening.

I’m using a VMWare VSphere VM version 8, running on esxi version 6.0 The guest has 16GB memory and 4 vCPUs.

The guest is connected to the Internet via NAT

Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         10.10.25.254    0.0.0.0         UG        0 0          0 eth0
10.10.25.0      0.0.0.0         255.255.255.0   U         0 0          0 eth0
172.17.0.0      0.0.0.0         255.255.0.0     U         0 0          0 docker0
172.18.0.0      0.0.0.0         255.255.0.0     U         0 0          0 docker-sys

Installation still produces the error:

[rancher@rancher ~]$ sudo ros install -c config.yml -d /dev/sda
INFO[0000] No install type specified...defaulting to generic
Installing from rancher/os:v0.4.0
Continue [y/N]: y
Unable to find image 'rancher/os:v0.4.0' locally
Pulling repository docker.io/rancher/os
Network timed out while trying to connect to https://index.docker.io/v1/repositories/rancher/os/images. You may want to check your internet connection or if you are behind a proxy.
FATA[0029] Failed to run install                         err=exit status 1

However the web address is contactable:

[rancher@rancher ~]$ wget  https://index.docker.io/v1/repositories/rancher/os/images
Connecting to index.docker.io (54.173.111.219:443)
images               100% |*******************************|  7084   0:00:00 ETA

Note that I have modified /etc/resolv.conf to point to our own DNS servers, as the Google DNS servers (8.8.8.8 etc) are not contactable from our subnet. It is possible that the ros command is using 8.8.8.8 even though resolv.conf specifies somewhere else? I have tried putting the IP in the /etc/hosts but this does not help.

As the install runs under busybox there are not many utilities available to track down what’s going on.

Hi sshipway,

The problem is pretty sure the following:

I had the same problem when I was using 172.18.1.0/24 for my Network on the LAN Port.
So when the packet should be sent to my default gw (172.18.1.1) it’s sent to the docker-sys “interface” and never reach the default-gw.

You can fix this in the cloud-config.yml (example):

system-docker:
  extra_args:
  - --fixed-cidr
  - 172.31.42.1/16

I know your LAN IP is in the 10.10.25.0/24 IP-Range but is there anything in between on the 172.18.0.0/16 Range?

Peter

I have just tried again, with RancherOS 4.2 and your addition to the cloud-config.yml, and it now works! No more hanging when trying to download the images. Not sure if this is v4.2 or the extra option, but whichever it is, it solved the problem.

I don’t think we have any 172.18 around the place getting in the way, but of course it is possible.

Thanks for thepointer