Convoy-nfs fails to start

This is what i get from the logs, it keep on restarting, not sure why it wont start. It all worked before upgrading to 1.0 and new convoy 0.3.0

but wont work on 0.2.0 now either.

Storage-pool logs:
31 mars 2016 13:11:02 CESTWaiting for metadata
31 mars 2016 13:11:02 CESTNo help topic for ‘convoy-nfs-pa’

nfs containers logs:
31 mars 2016 13:12:28 CESTWaiting for metadata
31 mars 2016 13:12:28 CESTtime=“2016-03-31T11:12:28Z” level=info msg="Execing [/usr/bin/nsenter --mount=/proc/780/ns/mnt -F – /var/lib/docker/aufs/mnt/d2b53665ae26cfad6ead074ec3510f4ba0e341c62f3a57dd672203c87cb009af/var/lib/rancher/convoy-agent/share-mnt --stage2 /var/lib/rancher/convoy/convoy-nfs-pa-a95f3417-af53-457a-918e-3b1896062338 – /launch volume-agent-nfs-internal 780]"
31 mars 2016 13:12:28 CESTRegistering convoy socket at /var/run/convoy-convoy-nfs-pa.sock
31 mars 2016 13:12:28 CESTMounting at: /var/lib/rancher/convoy/convoy-nfs-pa-a95f3417-af53-457a-918e-3b1896062338/mnt
31 mars 2016 13:12:28 CESTMounting nfs. Command: mount -t nfs -o rw,hard,intr,fsc,proto=tcp,port=2049,v3,nolock x.x.x.x:/my_vol /var/lib/rancher/convoy/convoy-nfs-pa-a95f3417-af53-457a-918e-3b1896062338/mnt
31 mars 2016 13:12:28 CESTNo help topic for ‘convoy-nfs-pa’

Minor updates. When i deploy a stack without host affinity, since i run my rancher hosts in aws + dmz + internal i just want nfs to be internal (company policy) the internal ones run and start as normal. But when i Upgrade and change host affinity to zone=internal it starts doing this stuff.

@Jens_Skott this may not be related to your issue, but did you upgrade convoy-nfs from v0.2.0 to v0.3.0? v0.3.0 is actually note backwards compatible with v0.2.0. We note this in the description that appears on the convoy-nfs catalog page, but you may not have seen it.

I tried to redeploy 0.2.0 since it had some issues, and i tried to deploy a new 0.3.0 aswell and it failed.

@Jens_Skott that No help topic for 'convoy-nfs-pa' is very strange.

Can you explain a little better about how you are deploying this? Are you doing anything custom or just deploying it from the catalog as-is?

And can you explain what you mean by this:

Minor updates. When i deploy a stack without host affinity, since i run my rancher hosts in aws + dmz + internal i just want nfs to be internal (company policy) the internal ones run and start as normal. But when i Upgrade and change host affinity to zone=internal it starts doing this stuff.

What stack are you adding host affinity to and how exactly are you doing that?

OK @Jens_Skott, I’ve made some progress. I can produce a similar error message if the CATTLE_URL environment variable is not set inside of the container. We set this (and the access and secret key env vars) on the backend when we detect that the container is acting as an agent.

So, I think that the upgrade for host affinity change that you are doing is somehow breaking this. When you give me more details on how your doing that upgrade, I’ll attempt to reproduce.

I deploy it straight out of the catalogue without starting the services, then I do an upgrade on the different services and adds the host should/must contain label key zone value internal since our policy says that storage has to be on a layer2 network and the hosts that has value internal can access the storage.
It worked in the version previous to rancher 1.0 and convoy 0.3.0 but doesnt work now.
When i do this the containers stop and start constantly with the errors.

I tried today to just deploy and start right away without doing that upgrade with k,v to lock them down to a host and then all containers on the internal zone hosts came up as running and the rest initiazling since they get nfs timeout to the nfs server as excpected. Thats what i find wierd.

Made a small video of what i did

1 Like

Any luck reproducing and hopefully fixing this issue @cjellick?

@Jens_Skott to be honest, I missed your last reply where you added the video. I just watched it now though. I will investigate and get back to you. Looks like it should be pretty reproducible.

I can confirm I am seeing the exact same issue with a fresh install of rancher-1.0.1 and convoy-nfs 0.3.0. Everything works fine until the affinities are applied. I even tried copying the config from a working version deployed directly from the catalog with the addition of the affinities. The result was the convoy-nfs containers throwing errors about unable to find metadata.

On another note, I recall seeing a post about convoy-glusterfs having the same sort of issue with affinities, but I can’t seem to find the post now. Between the github issues and the forum, it can be a little difficult to track things down.

Can you link in the github issue so we can follow it there?

I have exactly the same problem.

+1
same problem 1.0.1

Tried it now in 1.1.0-dev5 with amazon efs and it works to not start it then upgrade it and lock it to a zone with schedueling. Still haveing some other issues but working trough them as we speak =)