Neither Rancher-nfs nor Rancher-efs will successfully spin up

I must have something fundamentally wrong in my setup. Nobody else seems to have this issue although are having other issues. Anyhow, I have upgraded to 1.2 and tried to spin up a Rancher-nfs stack. The nfs-driver service spins up a container on each host, but each container starts and then stops with the following errors:

12/14/2016 11:28:17 PMtime="2016-12-15T05:28:17Z" level=info msg=Running
12/14/2016 11:28:17 PMtime="2016-12-15T05:28:17Z" level=info msg=Starting
12/14/2016 11:28:17 PMtime="2016-12-15T05:28:17Z" level=fatal msg="Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
12/14/2016 11:28:19 PMtime="2016-12-15T05:28:19Z" level=info msg=Running
12/14/2016 11:28:19 PMtime="2016-12-15T05:28:19Z" level=info msg=Starting
12/14/2016 11:28:19 PMtime="2016-12-15T05:28:19Z" level=fatal msg="Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"

I don’t have a clue what could be going wrong here. Could it be the version of Docker? I’m running 1.11.2.

As a troubleshooting step I copied the compose files and started up a custom stack using the same config, except I changed it to mount the docker.sock ( - /var/run/docker.sock:/var/run/docker.sock). That got rid of the above error, but then it failed with the following error:

12/14/2016 11:38:33 PMtime="2016-12-15T05:38:33Z" level=info msg=Running
12/14/2016 11:38:33 PMtime="2016-12-15T05:38:33Z" level=info msg=Starting
12/14/2016 11:38:35 PMtime="2016-12-15T05:38:35Z" level=fatal msg="getting driver ID: rancher-nfs is not a driver registered with the current Rancher environment"
12/14/2016 11:39:03 PMtime="2016-12-15T05:39:03Z" level=info msg=Running
12/14/2016 11:39:03 PMtime="2016-12-15T05:39:03Z" level=info msg=Starting
12/14/2016 11:39:05 PMtime="2016-12-15T05:39:05Z" level=fatal msg="getting driver ID: rancher-nfs is not a driver registered with the current Rancher environment"

Any help would be appreciated. I’m completely lost on this one. There seems to be no documentation on how to run this stack, although it should be straight forward. I don’t know if I am missing a dependency or what other issues might be going on in my environment to cause this.

1 Like

I am having the same problem… subscribing to this thread.

Same issue here as well with Rancher NFS on v1.2.2.

Thankfully our Convoy NFS stack is still working and providing storage to it’s container stack so we don’t need this, but it seems like Rancher NFS will be maintained where Convoy NFS is going away and I’d like to move sooner than later.

I copied the same configuration,

  • IP address of EFS server due to VPC restrictions
  • Mount point
  • Mount options

and it continaully fails with the same Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?" error. All other containers are running, so this isn’t an issue with Docker.

Since Convoy NFS is using nfs4 the kernel and supporting packages in the host OS are there as well.

@ecliptik I have been doing a bit more testing.

I am using a vSphere based setup and when I provision hosts they get provisioned with the default latest boot2docker image. That failed.

I then tried to recreate the setup on Digital Ocean (using the Ubuntu 16.xx image for the hosts) and I could bring NFS up.

So I tried to manually provision Ubuntu 16.xx hosts on vSphere and import them into Rancher as “custom hosts” and I could bring NFS up now.

I am leaning to conclude that it is some sort of incompatibility between Rancher NFS and boot2docker (at least based on my limited testing).

What OS are you using for your hosts?

@mreferre, we’re using Amazon Linux 2016.09, which afaik is a derivative or RHEL 6 with a 4.4 kernel and an Amazon maintained yum repo so the software versions are a bit fresher (Docker 1.11.2 for example).

I logged directly into one of the Rancher EC2 instances in an environment I tried to deploy Rancher NFS connecting to Amazon EFS and it works properly, so I would assume it would work in a Ubuntu 16.04 container as well.

Rancher host instance,

[ec2-user@ip-10-1-3-106 ~]$ sudo mount -t nfs -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 10.1.2.250:/ /mnt
[ec2-user@ip-10-1-3-106 mnt]$ mount | grep 10.1.2.250
10.1.2.250:/ on /mnt type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.1.3.106,local_lock=none,addr=10.1.2.250)

Ubuntu 16.04 privileged container on Rancher host instance works after installing nfs-common package

[ec2-user@ip-10-1-3-106 ~]$ sudo docker run -it --rm --privileged ubuntu:16.04 /bin/bash
root@dbed76682b6b:/# apt-get update && apt-get install -y nfs-common
root@dbed76682b6b:/# mount -t nfs -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 10.1.2.250:/ /mnt
root@dbed76682b6b:/# mount | grep 10.1.2.250
10.1.2.250:/ on /mnt type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.17.0.13,local_lock=none,addr=10.1.2.250)

So it seems like something in the Rancher NFS service stack itself. I’ll take a look at the catalog entry github and see if there’s anything I may have missed.

@ecliptik I will take my previous update partially back … I have just tried to redeploy from scratch on vSphere with Ubuntu 16 hosts and now it fails (albeit with a different and very generic error message saying the containers can’t be started). Mh…

I’m glad I’m not the only one struggling through this. I have given up for the time being as I am trying to solve other problems elsewhere. I have reverted temporarily to just installing nfs-utils in my container images and passing in the mount commands as environment variables. It’s not optimal because I have to alter my images to facilitate NFS, but it works without fail.

My next troubleshooting step is to stand up a fresh 1.2 environment to see if this all has something to do with the upgrade. I’ve seen others be able to start up these stacks without issue, but I think in all cases they were fresh installs. So I was wondering if any on this thread are experiencing these problems on a fresh 1.2 environment.

there were some known boot2docker issues with 1.2, I believe those were addressed in 1.2.1 or 1.2.2. You should give that a spin and see if you get further with rancher-nfs. It would be helpful if you could also provide the rancher-nfs template version in use.

Hi @opax,

It looks like you are right on track. The /var/run folder was migrated to /run in newer operating systems - it appears that you may be running an older OS. The legacy folder locations are symlinked for compatibility purposes, so I think it’s a safe bet to change our storage driver templates to expose the legacy path. I will do so shortly.

Here is the ticket for tracking: https://github.com/rancher/rancher/issues/7237

Also, FYI, the reason that your modified stack didn’t start up when you made the change is due to the absence of a ‘system’ flag that must be present for infrastructure stacks to behave appropriately. This value is set for all templates in the infra-templates folder that are launched from the UI. You might also get away with using our CLI, e.g. rancher stack create --system but I haven’t personally tested this path WRT storage plugins.

@LLParse thank you for the response, that github issue is exactly what we’re experiencing with Amazon Linux 2016.09.

@LLParse could this affect also the latest boot2docker image deployed using the vSphere driver? I have seen the error message @opax and @ecliptik mentioned in that scenario as well.

@mreferre It certainly could. The fix is available in a branch but hasn’t yet been validated by QA.

You are welcome to give it a try - override the Rancher Certified Library by adding a custom catalog:

Name: library
URL: https://git.rancher.io/rancher-catalog
Branch: v1.3.0-rc3

Best of luck!

I had the same issue as @mreferre. Now with rancher 1.3.0 the nfs-driver fires up, but when I create a volume and try to mount it in a container I get:

Error (Error response from daemon: create Volume: create Volume: Error looking up volume plugin rancher-nfs: plugin not found

Am I missing something?

@mreferre I have had similar issues with 1.3rc3 (I think). I apologize if I can’t be more precise but I am playing with a lab env and I have done multiple tests (that I didn’t keep a precise track of).

I think I have seen that message when deploying boot2docker images (i.e. NFS comes up fine but creating volumes and attaching them to containers spits that error). I was able to make it work by installing Ubuntu 16.04 VMs and importing them “custom”. That worked.

@mreferre Yes other OS’s seem to work (RancherOS in my case).
The problem is, that in order to use it productively I would need to have it working on boot2docker, or anything else, that can be deployed through the interface to vsphere (RancherOS does not work on vsphere).

+1 for this, exactly the same issue with both rancher-ebs and rancher-nfs. We too are using Amazon Linux.

I’m currently speaking directly to Raul from Rancher so will report back with any updates.

To fix the infrastructure stack issue, I was able to change the ‘system’ flag of an existing stack to make it an infrastructure stack.

If you click the “View in API” button on a stack from the Rancher UI it will bring you to the API Viewer. If you click “Edit” in the upper right-hand corner of the page you can check the check-box labeled ‘system’ and it will put it in the infrastructure section with the other infrastructure stacks.

I’m not sure if any kind of restart of the stack is necessary to make it work like an infrastructure stack after that, though.