Neither Rancher-nfs nor Rancher-efs will successfully spin up

I must have something fundamentally wrong in my setup. Nobody else seems to have this issue although are having other issues. Anyhow, I have upgraded to 1.2 and tried to spin up a Rancher-nfs stack. The nfs-driver service spins up a container on each host, but each container starts and then stops with the following errors:

12/14/2016 11:28:17 PMtime="2016-12-15T05:28:17Z" level=info msg=Running
12/14/2016 11:28:17 PMtime="2016-12-15T05:28:17Z" level=info msg=Starting
12/14/2016 11:28:17 PMtime="2016-12-15T05:28:17Z" level=fatal msg="Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
12/14/2016 11:28:19 PMtime="2016-12-15T05:28:19Z" level=info msg=Running
12/14/2016 11:28:19 PMtime="2016-12-15T05:28:19Z" level=info msg=Starting
12/14/2016 11:28:19 PMtime="2016-12-15T05:28:19Z" level=fatal msg="Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"

I donā€™t have a clue what could be going wrong here. Could it be the version of Docker? Iā€™m running 1.11.2.

As a troubleshooting step I copied the compose files and started up a custom stack using the same config, except I changed it to mount the docker.sock ( - /var/run/docker.sock:/var/run/docker.sock). That got rid of the above error, but then it failed with the following error:

12/14/2016 11:38:33 PMtime="2016-12-15T05:38:33Z" level=info msg=Running
12/14/2016 11:38:33 PMtime="2016-12-15T05:38:33Z" level=info msg=Starting
12/14/2016 11:38:35 PMtime="2016-12-15T05:38:35Z" level=fatal msg="getting driver ID: rancher-nfs is not a driver registered with the current Rancher environment"
12/14/2016 11:39:03 PMtime="2016-12-15T05:39:03Z" level=info msg=Running
12/14/2016 11:39:03 PMtime="2016-12-15T05:39:03Z" level=info msg=Starting
12/14/2016 11:39:05 PMtime="2016-12-15T05:39:05Z" level=fatal msg="getting driver ID: rancher-nfs is not a driver registered with the current Rancher environment"

Any help would be appreciated. Iā€™m completely lost on this one. There seems to be no documentation on how to run this stack, although it should be straight forward. I donā€™t know if I am missing a dependency or what other issues might be going on in my environment to cause this.

1 Like

I am having the same problemā€¦ subscribing to this thread.

Same issue here as well with Rancher NFS on v1.2.2.

Thankfully our Convoy NFS stack is still working and providing storage to itā€™s container stack so we donā€™t need this, but it seems like Rancher NFS will be maintained where Convoy NFS is going away and Iā€™d like to move sooner than later.

I copied the same configuration,

  • IP address of EFS server due to VPC restrictions
  • Mount point
  • Mount options

and it continaully fails with the same Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?" error. All other containers are running, so this isnā€™t an issue with Docker.

Since Convoy NFS is using nfs4 the kernel and supporting packages in the host OS are there as well.

@ecliptik I have been doing a bit more testing.

I am using a vSphere based setup and when I provision hosts they get provisioned with the default latest boot2docker image. That failed.

I then tried to recreate the setup on Digital Ocean (using the Ubuntu 16.xx image for the hosts) and I could bring NFS up.

So I tried to manually provision Ubuntu 16.xx hosts on vSphere and import them into Rancher as ā€œcustom hostsā€ and I could bring NFS up now.

I am leaning to conclude that it is some sort of incompatibility between Rancher NFS and boot2docker (at least based on my limited testing).

What OS are you using for your hosts?

@mreferre, weā€™re using Amazon Linux 2016.09, which afaik is a derivative or RHEL 6 with a 4.4 kernel and an Amazon maintained yum repo so the software versions are a bit fresher (Docker 1.11.2 for example).

I logged directly into one of the Rancher EC2 instances in an environment I tried to deploy Rancher NFS connecting to Amazon EFS and it works properly, so I would assume it would work in a Ubuntu 16.04 container as well.

Rancher host instance,

[ec2-user@ip-10-1-3-106 ~]$ sudo mount -t nfs -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 10.1.2.250:/ /mnt
[ec2-user@ip-10-1-3-106 mnt]$ mount | grep 10.1.2.250
10.1.2.250:/ on /mnt type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.1.3.106,local_lock=none,addr=10.1.2.250)

Ubuntu 16.04 privileged container on Rancher host instance works after installing nfs-common package

[ec2-user@ip-10-1-3-106 ~]$ sudo docker run -it --rm --privileged ubuntu:16.04 /bin/bash
root@dbed76682b6b:/# apt-get update && apt-get install -y nfs-common
root@dbed76682b6b:/# mount -t nfs -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 10.1.2.250:/ /mnt
root@dbed76682b6b:/# mount | grep 10.1.2.250
10.1.2.250:/ on /mnt type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.17.0.13,local_lock=none,addr=10.1.2.250)

So it seems like something in the Rancher NFS service stack itself. Iā€™ll take a look at the catalog entry github and see if thereā€™s anything I may have missed.

@ecliptik I will take my previous update partially back ā€¦ I have just tried to redeploy from scratch on vSphere with Ubuntu 16 hosts and now it fails (albeit with a different and very generic error message saying the containers canā€™t be started). Mhā€¦

Iā€™m glad Iā€™m not the only one struggling through this. I have given up for the time being as I am trying to solve other problems elsewhere. I have reverted temporarily to just installing nfs-utils in my container images and passing in the mount commands as environment variables. Itā€™s not optimal because I have to alter my images to facilitate NFS, but it works without fail.

My next troubleshooting step is to stand up a fresh 1.2 environment to see if this all has something to do with the upgrade. Iā€™ve seen others be able to start up these stacks without issue, but I think in all cases they were fresh installs. So I was wondering if any on this thread are experiencing these problems on a fresh 1.2 environment.

there were some known boot2docker issues with 1.2, I believe those were addressed in 1.2.1 or 1.2.2. You should give that a spin and see if you get further with rancher-nfs. It would be helpful if you could also provide the rancher-nfs template version in use.

Hi @opax,

It looks like you are right on track. The /var/run folder was migrated to /run in newer operating systems - it appears that you may be running an older OS. The legacy folder locations are symlinked for compatibility purposes, so I think itā€™s a safe bet to change our storage driver templates to expose the legacy path. I will do so shortly.

Here is the ticket for tracking: https://github.com/rancher/rancher/issues/7237

Also, FYI, the reason that your modified stack didnā€™t start up when you made the change is due to the absence of a ā€˜systemā€™ flag that must be present for infrastructure stacks to behave appropriately. This value is set for all templates in the infra-templates folder that are launched from the UI. You might also get away with using our CLI, e.g. rancher stack create --system but I havenā€™t personally tested this path WRT storage plugins.

@LLParse thank you for the response, that github issue is exactly what weā€™re experiencing with Amazon Linux 2016.09.

@LLParse could this affect also the latest boot2docker image deployed using the vSphere driver? I have seen the error message @opax and @ecliptik mentioned in that scenario as well.

@mreferre It certainly could. The fix is available in a branch but hasnā€™t yet been validated by QA.

You are welcome to give it a try - override the Rancher Certified Library by adding a custom catalog:

Name: library
URL: https://git.rancher.io/rancher-catalog
Branch: v1.3.0-rc3

Best of luck!

I had the same issue as @mreferre. Now with rancher 1.3.0 the nfs-driver fires up, but when I create a volume and try to mount it in a container I get:

Error (Error response from daemon: create Volume: create Volume: Error looking up volume plugin rancher-nfs: plugin not found

Am I missing something?

@mreferre I have had similar issues with 1.3rc3 (I think). I apologize if I canā€™t be more precise but I am playing with a lab env and I have done multiple tests (that I didnā€™t keep a precise track of).

I think I have seen that message when deploying boot2docker images (i.e. NFS comes up fine but creating volumes and attaching them to containers spits that error). I was able to make it work by installing Ubuntu 16.04 VMs and importing them ā€œcustomā€. That worked.

@mreferre Yes other OSā€™s seem to work (RancherOS in my case).
The problem is, that in order to use it productively I would need to have it working on boot2docker, or anything else, that can be deployed through the interface to vsphere (RancherOS does not work on vsphere).

+1 for this, exactly the same issue with both rancher-ebs and rancher-nfs. We too are using Amazon Linux.

Iā€™m currently speaking directly to Raul from Rancher so will report back with any updates.

To fix the infrastructure stack issue, I was able to change the ā€˜systemā€™ flag of an existing stack to make it an infrastructure stack.

If you click the ā€œView in APIā€ button on a stack from the Rancher UI it will bring you to the API Viewer. If you click ā€œEditā€ in the upper right-hand corner of the page you can check the check-box labeled ā€˜systemā€™ and it will put it in the infrastructure section with the other infrastructure stacks.

Iā€™m not sure if any kind of restart of the stack is necessary to make it work like an infrastructure stack after that, though.