Convoy EFS to Rancher EFS

So, we setup a POC instance of 1.2 to experiment with what it will take to migrate to it. Serious kudos to the rancher team on the new HA setup. I can’t even being to describe how much better and easier it is than before. Along with AWS’s new Application ELB which supports websockets out-the-door, setting up a Rancher HA environment in AWS is now ridiculously easy.

We’ve run into a few minor issues, but the big one is the difference between Convoy EFS and Rancher EFS for persistent storage volumes. I may be mis-remembering, but before 1.2 you started the Convoy EFS service with a specific EFS file system and then each “volume” was a subdirectory in that file system. After some initial head scratching about this approach, we came around to liking it quite a bit. It allowed us to have hundreds of isolated storage volumes without violating the quite tight limits on EFS file systems in AWS (10 before requesting more).

The new Rancher EFS is much more logical, in that one persistent storage volume equals one EFS file system, however, because we were utilizing well over a hundred storage volumes for different purposes and applications, we’re not sure how to migrate. I’m sure that AWS would increase our limits, but I’m not sure how keen they would be to give a 10-20x increase that we would need (we’re a rather small AWS account). Despite the old system being a bit less straightforward, I think it fit our needs better.

Any suggestions?

Yeah, that’s actually something I’ve been meaning to bring up as well. In my case, for my small personal hosting consultancy that I do, having ~10 EFS volumes is more expensive than having one EFS volume that each site then has their own subdirectory on and that’s a problem for me when I’m already operating at a loss.

It’s also unclear to me how EFS HA works in regards to AZs; in my case I am using us-west-2 and have a node in each of us-west-2a, 2b, and 2c. But a new EFS volume created by Rancher only has a mount point in one AZ. Shouldn’t this, either by default, or by configuration option, provision the volume with all available mount points and then each node uses the local AZ to mount the EFS volume from?

I suppose I could some of the setup outside of rancher and then use the Rancher NFS service to connect to EFS. This means setting up directories manually, etc. It’s not a huge deal but the prior experience was so seamless that this feel like a bit oi a step backwards.

Actually, come to think of it, maybe that won’t work. The NFS target will be different if the host is in different AZ in AWS. That means I’ll need to set up a complicated and static system of label to make sure application and NFS stores are in the same AZ’s, which is exactly one of the problems EFS was intended to fix. Grrr… So we’re a little screwed here. There are some very nice advancements in 1.2 that we’d like to take advantage of, but unless we rethink the storage mechanism behind 5 different production applications, we’re stuck.

It’s interesting that we are constantly seeing a stream of new “persistent distributed storage” products pop up in the Docker community but none seem to really gain a defacto leadership position and what do end up with as the only “official” options in Rancher? Two paths; both based on good ole’ NFS. This is a critical area of truly dynamic Docker environments that doesn’t appears to have a strong answer yet.

I realize I’m talking to myself here, but one can get “close” to how I described my preferred setup by using rancher-nfs instead of rancher-efs. I tried to use “$(curl -s http://###.###.###.###/latest/meta-data/placement/availability-zone).fs-1ac6565853.efs.us-east-1.amazonaws.com” for the NFS server (along with the correct mount options from EFS’s documentation) in the rancher-nfs service and the error I received shows the correct mount command (literally the exact correct mount command), but as I just said, there was an error. Perhaps curl’s not installed? Perhaps it’s running the mount command in a script of some kind? Anyway, if I change the curl command to a direct server reference it mounts fine and works exactly like the old convoy efs in that a new directory is created for each storage volume setup in the Rancher UI.

I’ll dig into the rancher-nfs code (if it’s available) and see if if there’s a way to get it to use the dynamic server reference instead of an absolute one.

So, as it turns out, the driver is fairly straightforward and the script that runs and mounts the NFS share would need to modified in order to accept a command like “$(curl -s http://###.###.###.###/latest/meta-data/placement/availability-zone).fs-1ac6565853.efs.us-east-1.amazonaws.com” in the server field. Inside the script, that environment variable is used in a very literal sense in the statement:

mntSrc=="${NFS_SERVER}":"${MOUNT_DIR}"/"${OPTS[name]}"

I think I’m stuck. I will either try to create my own “driver” based on a slight variation of the Rancher NFS service, or take a longer term view and stick to the previous version of Rancher and look towards updating our ideas and methods of persistent storage before upgrading to 1.2.

Hey Matt. I think there is a way to format the option to make this work. However, I am unable to get the stack to startup at all, with the following errors.

12/14/2016 1:34:12 PMtime="2016-12-14T19:34:12Z" level=info msg=Running
12/14/2016 1:34:12 PMtime="2016-12-14T19:34:12Z" level=info msg=Starting
12/14/2016 1:34:12 PMtime="2016-12-14T19:34:12Z" level=fatal msg="Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
12/14/2016 1:34:28 PMtime="2016-12-14T19:34:28Z" level=info msg=Running
12/14/2016 1:34:28 PMtime="2016-12-14T19:34:28Z" level=info msg=Starting
12/14/2016 1:34:28 PMtime="2016-12-14T19:34:28Z" level=fatal msg="Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
12/14/2016 1:34:58 PMtime="2016-12-14T19:34:58Z" level=info msg=Running
12/14/2016 1:34:58 PMtime="2016-12-14T19:34:58Z" level=info msg=Starting

Have you seen this behavior? I assume it needs to attach to the hosts docker.sock, but there is nothing in the yaml files that are mounting the sock. I guess I will try to add the docker.sock mount to the yaml and see how that turns out.

Anyways, if I figure that out, I will start working toward making this NFS mount a dynamic EFS mount.

No, I haven’t seen that. Even when the NFS address field I enter is nosense, the plugin/service container always starts up. It’s only when I try to use a storage volume from the plugin/service that I get an error indicating the the mounting of the NFS volume doesn’t work.

I wish you luck. I’m at the point where I’m adapting the NFS plugin to work by modifying the script that’s part of the rancher/storage-nfs image in my own private version of that image. If I’m successful I’ll describe what I did. It’s a pain in rear just to get back to the way things used to work, but the other benefits of moving to 1.2 will hopefully make it worth it.

If you wouldn’t mind, can you post the docker-compose.yml and rancher-compose.yml files for a successfully started up stack in your environment?

Again, keep in mind, that while this starts and runs, when it comes time to mount a storage volume, it fails to do so as the curl command isn’t run, but is instead just spit our verbatim.

I used the Rancher-NFS catalog item with the following answers:

MOUNT_DIR=/
MOUNT_OPTS=nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2
NFS_SERVER=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone).fs-1ac10853.efs.us-east-1.amazonaws.com

docker-compose

version: ‘2’
services:
nfs-driver:
privileged: true
image: rancher/storage-nfs:v0.6.0
environment:
MOUNT_DIR: /
MOUNT_OPTS: nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2
NFS_SERVER: $$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone).fs-1ac10853.efs.us-east-1.amazonaws.com
volumes:
- /run:/run
- /dev:/host/dev
- /var/lib/rancher/volumes:/var/lib/rancher/volumes:shared
logging:
driver: json-file
options:
max-file: ‘2’
max-size: 25m
pid: host
labels:
io.rancher.container.agent.role: environment
io.rancher.container.create_agent: ‘true’
io.rancher.scheduler.global: ‘true’

rancher-compose

version: ‘2’
services:
nfs-driver:
storage_driver:
description: null
name: rancher-nfs
volumeAccessMode: multiHostRW

I tried both the “blockquote” and the “Pre-formatted text” editing widgets in an attempt to preserve the indentation of the lines above, but neither worked. Odd that in a technical forum we have no obvious way to post “code”, or perhaps I’m just missing the option somewhere.

Many thanks Matt. Appreciate the response. You are the only one that has responded to anything I have posted here. I always feel like I’m talking to myself. Maybe my questions are stupid ones?

So I guess I’m giving up on this for now. I created my own storage driver built on top of the existing driver for NFS, but I can’t figure out how to register this with Rancher. When I tried to launch the stack it fails with “getting driver ID: rancher-efs-single is not a driver registered with the current Rancher environment” (ramcher-efs-single) being my hacked together way to regain the old convoy-efs behaviors. It’s all very frustrating to go from a a situation that we setup to accommodate exactly how rancher was working with EFS and now because of a dramatic change in behavior to be stuck with no obvious upgrade path.

Ha! Amazon fixed the problem for me with their new EFS DNS names!

So, to reiterate problem, the new rancher-efs service creates a EFS file system for every persistent volume you create in the Rancher UI. The old convoy-efs would instead just create a new directory in one EFS file system for each volume you created in the Rancher UI. Given the rather tights limits on the number of EFS volumes one is allowed to create, the old method was preferable to me for my company’s use cases.

The new rancher-nfs service DOES in fact work like the old convoy-efs in that it create a directory on a single NFS share for each persistent volume created in Rancher UI. So why not just use the rancher-nfs service to access the EFS file system, you say? Well, ideally there’s an specific NFS target in each AWS availability zone for a single EFS file system. That means a different NFS host IP depending on which AZ your Docker/Rancher host node resides in. The lookup for that address used a curl command to find the right NFS target for mounting. The rancher-nfs service would not interpolate that curl command. It just used it literally as the host address, which of course failed.

Well, now, it seems that AWS has simplified things. Your EFS host for a specific file system now has just one universal DNS name and that name resolves to a different IP depending on which AZ it’s used from. Brilliant! With this change, we can use the rancher-nfs service for our EFS volumes beautifully! When setting up the rancher-nfs service, you will be asked for an NFS server, mount options, and mount directory. The NFS server is simply the DNS name for you EFS file system. The mount options can be pulled from the EFS docs (nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2), and the directory is just whatever directory on the EFS file system you want to use for the base location of your persistent volumes. I suspect most will use “/”.

I now have my persistent storage working again and can migrate my full Rancher environment to 1.2.

Thanks for the update, I hadn’t heard about that change on the EFS side. I’ll definitely be looking into this myself.

Tried it myself tonight, and it was unable to look up the base DNS name… fs-efdaxxxx.efs.us-west-2.amazonaws.com.

When you say “it” here, what are you referring to? I would suggest first trying to login to the host nodes where you’re trying to setup the volume and doing a nslookup of the DNS name from there. If that doesn’t work, then naturally the docker container trying to mount the share won’t be able to resolve it either. Once you figure out why the host can’t resolve the name and fix that, I’m betting the rancher service will start working as well.

Just to be clear, for the purposes of my own testing, I:

  1. Created the EFS file system in AWS and made sure it had mount targets in all of the AZ’s where I was deploying rancher host nodes
  2. Went the the “Library” for my environment and started the Rancher NFS stack as described in my previous post
  3. Went to the Infrastructure->Storage menu item and created a persistent volume for the rancher-nfs service
  4. Went to Infrastructure->Hosts and selected one of my hosts then clicked, “Add Container”; I then setup a no frills ubuntu container (the default) and attached the volume I just created.
  5. Once the container was up, I went to it in the Rancher UI and selected “Execute Shell” and went to the volume location and created a few files
  6. I started a second container similar to the first attaching to the same volume and opened another shell to verify that the files were there
  7. Finally, just to be absolutely sure, I manually mounted the EFS file system on a completely separate EC2 instance and verified that the files I created were where I expected then to be
1 Like

Hi Matt,

Reading through your dialog(with yourself :slight_smile:) I was able to get rancher-nfs set up with aws efs and persist data across ec2 instances and across aws zones.

The only thing I still don’t understand…and I think would be helpful to round out this forum is: How do we insure that a stack is mounting its volumes on the efs that rancher-nfs is using? Does that just happen automatically or do we need to specify a specific path in the docker-compose config? Or are there some setting in the rancher-nfs stack I need to tweak? Also If I had Rancher-efs before how do I port the volumes that were created there into aws efs(managed by rancher-nfs)?

Borrowing a quote from “The Office(US)”

“Why don’t you explain this to me like I was five.”

Thank you for your time, George.

It’s entirely possible I’m misunderstanding your question, so forgive me here if I’m giving you information you already have.

When you create the Rancher NFS service, you specify the NFS server. This will be a domain name like fs-a57d38ef.efs.us-east-1.amazonaws.com (this is the “new” thing AWSs just recently added that made all this possible). Notice the very first part of that name. That is in fact the actual ID of a specific EFS file system. When you create persistent storage volumes in the Rancher UI or with the API using the Rancher NFS service, it will use that specific EFS volume. As far as I know, there is no way to have multiple Rancher NFS services running so, by default you will only ever be using that one EFS file system.

Hi Matt,

That was super helpful, thank you. It answered part of my question(didn’t know you could create persistent volumes via the rancher UI). So then in my docker-compose how would I reference those volumes created via the rancher UI? Is it just simply the name of the volume?

So something like

version: '2'
services:
  postgres:
    volumes:
      - nameOfPersistentVolume:/var/lib/rancher/volumes

?

Is there a easy way to see where volumes live in the File system / Mounted File systems via CLI? Would like to double check volumes are being stored in the EFS.

Sorry if this is out of scope of this forum.

George

This is a pseudo-exert from one of my docker-compose files (actually, it’s the generated docker-compose from something I setup in the Rancher UI)

version: '2'
volumes:
 jenkins-server-volume:
    external: true
    driver: rancher-nfs
services:
  jenkins-server:
    image: jenkins:2.7.4
    stdin_open: true
    volumes:
    - jenkins-server-volume:/var/jenkins_home
    tty: true
    ports:
    - 8080:8080/tcp
    - 50000:50000/tcp
    labels:
      io.rancher.scheduler.affinity:host_label: subnet=private
      io.rancher.container.hostname_override: container_name

In this case, the persistent volume I setup int he Rancher UI was named “jenkins-server-volume”. It’s identified in the “volumes” section, and then used in the “services” section. I’m not a docker-compose expert yet, so there may be a more straightforward approach. This is all described in more detail and with different examples for the different scopes here: https://docs.rancher.com/rancher/v1.2/en/rancher-services/storage-service/

To answer your last question… I’m sure there are a dozen ways to view/verify the volume contents, but I just went to Infrastructure->Hosts in the Rancher UI, picked a host and selected “Add Container”. I then started a super basic no-frills ubuntu container and added the storage volume to it and some simple location like “/verify”. I then start the container, and again using the Rancher UI, “Execute a Shell” on it. I can then look in the “/verify” directory and I should see the contents of the volume. You can even start a second container in exactly the same way and connect the same volume. Then add or modify some data in one container and then look in the directory in the second container and make sure the data you added or changed was persisted.

1 Like