Convoy-nfs fails to start

Jens_Skott · March 31, 2016, 11:13am

This is what i get from the logs, it keep on restarting, not sure why it wont start. It all worked before upgrading to 1.0 and new convoy 0.3.0

but wont work on 0.2.0 now either.

Storage-pool logs:
31 mars 2016 13:11:02 CESTWaiting for metadata
31 mars 2016 13:11:02 CESTNo help topic for ‘convoy-nfs-pa’

nfs containers logs:
31 mars 2016 13:12:28 CESTWaiting for metadata
31 mars 2016 13:12:28 CESTtime=“2016-03-31T11:12:28Z” level=info msg="Execing [/usr/bin/nsenter --mount=/proc/780/ns/mnt -F – /var/lib/docker/aufs/mnt/d2b53665ae26cfad6ead074ec3510f4ba0e341c62f3a57dd672203c87cb009af/var/lib/rancher/convoy-agent/share-mnt --stage2 /var/lib/rancher/convoy/convoy-nfs-pa-a95f3417-af53-457a-918e-3b1896062338 – /launch volume-agent-nfs-internal 780]"
31 mars 2016 13:12:28 CESTRegistering convoy socket at /var/run/convoy-convoy-nfs-pa.sock
31 mars 2016 13:12:28 CESTMounting at: /var/lib/rancher/convoy/convoy-nfs-pa-a95f3417-af53-457a-918e-3b1896062338/mnt
31 mars 2016 13:12:28 CESTMounting nfs. Command: mount -t nfs -o rw,hard,intr,fsc,proto=tcp,port=2049,v3,nolock x.x.x.x:/my_vol /var/lib/rancher/convoy/convoy-nfs-pa-a95f3417-af53-457a-918e-3b1896062338/mnt
31 mars 2016 13:12:28 CESTNo help topic for ‘convoy-nfs-pa’

Jens_Skott · March 31, 2016, 1:01pm

Minor updates. When i deploy a stack without host affinity, since i run my rancher hosts in aws + dmz + internal i just want nfs to be internal (company policy) the internal ones run and start as normal. But when i Upgrade and change host affinity to zone=internal it starts doing this stuff.

cjellick · March 31, 2016, 5:57pm

@Jens_Skott this may not be related to your issue, but did you upgrade convoy-nfs from v0.2.0 to v0.3.0? v0.3.0 is actually note backwards compatible with v0.2.0. We note this in the description that appears on the convoy-nfs catalog page, but you may not have seen it.

Jens_Skott · March 31, 2016, 6:21pm

I tried to redeploy 0.2.0 since it had some issues, and i tried to deploy a new 0.3.0 aswell and it failed.

cjellick · March 31, 2016, 7:02pm

@Jens_Skott that No help topic for 'convoy-nfs-pa' is very strange.

Can you explain a little better about how you are deploying this? Are you doing anything custom or just deploying it from the catalog as-is?

And can you explain what you mean by this:

Minor updates. When i deploy a stack without host affinity, since i run my rancher hosts in aws + dmz + internal i just want nfs to be internal (company policy) the internal ones run and start as normal. But when i Upgrade and change host affinity to zone=internal it starts doing this stuff.

What stack are you adding host affinity to and how exactly are you doing that?

cjellick · March 31, 2016, 7:19pm

OK @Jens_Skott, I’ve made some progress. I can produce a similar error message if the CATTLE_URL environment variable is not set inside of the container. We set this (and the access and secret key env vars) on the backend when we detect that the container is acting as an agent.

So, I think that the upgrade for host affinity change that you are doing is somehow breaking this. When you give me more details on how your doing that upgrade, I’ll attempt to reproduce.

Jens_Skott · March 31, 2016, 7:19pm

I deploy it straight out of the catalogue without starting the services, then I do an upgrade on the different services and adds the host should/must contain label key zone value internal since our policy says that storage has to be on a layer2 network and the hosts that has value internal can access the storage.
It worked in the version previous to rancher 1.0 and convoy 0.3.0 but doesnt work now.
When i do this the containers stop and start constantly with the errors.

I tried today to just deploy and start right away without doing that upgrade with k,v to lock them down to a host and then all containers on the internal zone hosts came up as running and the rest initiazling since they get nfs timeout to the nfs server as excpected. Thats what i find wierd.

Made a small video of what i did

Jens_Skott · April 11, 2016, 1:23pm

Any luck reproducing and hopefully fixing this issue @cjellick?

cjellick · April 12, 2016, 4:43pm

@Jens_Skott to be honest, I missed your last reply where you added the video. I just watched it now though. I will investigate and get back to you. Looks like it should be pretty reproducible.

JD_Trout · April 27, 2016, 6:52pm

I can confirm I am seeing the exact same issue with a fresh install of rancher-1.0.1 and convoy-nfs 0.3.0. Everything works fine until the affinities are applied. I even tried copying the config from a working version deployed directly from the catalog with the addition of the affinities. The result was the convoy-nfs containers throwing errors about unable to find metadata.

On another note, I recall seeing a post about convoy-glusterfs having the same sort of issue with affinities, but I can’t seem to find the post now. Between the github issues and the forum, it can be a little difficult to track things down.

Jens_Skott · May 4, 2016, 6:55am

Can you link in the github issue so we can follow it there?

cloudlady911 · May 12, 2016, 12:44am

I have exactly the same problem.

bacheson · June 30, 2016, 6:31pm

+1
same problem 1.0.1

Jens_Skott · July 1, 2016, 6:08am

Tried it now in 1.1.0-dev5 with amazon efs and it works to not start it then upgrade it and lock it to a zone with schedueling. Still haveing some other issues but working trough them as we speak =)

Topic		Replies	Views
Configuring Convoy-nfs Convoy	11	5388	September 25, 2016
Convoy NFS stuck initializing w/ AWS EFS Convoy	3	2736	September 7, 2016
Convoy NFS agent wont start "Failed to find state.json" Convoy	0	2089	April 26, 2016
Issue with Convoy-NFS causing docker problems Convoy	1	2615	January 25, 2017
Convoy-nfs is stuck initializing Convoy	7	5393	November 17, 2016

Convoy-nfs fails to start

Related topics