Convoy-nfs is stuck initializing


#1

I have a two-host Rancher environment. It’s running on top of RancherOS. Everything is the newest version that was out as of tonight. I installed convoy-nfs. It spun up fine on one of my nodes, but another is stuck in an “Initializing” state without ever passing its health check. On the host that is stuck, I can ssh in and see that the nfs share is mounted, and writeable by the root user.

Here’s the output of “docker logs” for the stuck container. Ideas?


Waiting for metadata.time="2016-01-15T03:10:40Z" level=info msg="Execing [/usr/bin/nsenter --mount=/proc/556/ns/mnt -F -- /var/lib/docker/overlay/4f43da0e5debb337b85611090bbf8645742a4b6266e39f054e61a7f6b572a828/merged/var/lib/rancher/convoy-agent/share-mnt --stage2 /var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08 -- /launch volume-agent-nfs-internal]" 
Registering convoy socket at /var/run/conoy-convoy-nfs.sock
Mounting at: /var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08/mnt
Mounting nfs. Command: mount -t nfs -o nfsvers=4 192.168.10.9:/spud/export/rancher /var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08/mnt
time="2016-01-15T03:10:41Z" level=info msg="Listening for health checks on 0.0.0.0:10241/healthcheck" 
time="2016-01-15T03:10:41Z" level=info msg="Got: root /var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08" 
time="2016-01-15T03:10:41Z" level=info msg="Got: drivers [vfs]" 
time="2016-01-15T03:10:41Z" level=info msg="Got: driver-opts [vfs.path=/var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08/mnt]" 
time="2016-01-15T03:10:41Z" level=info msg="Launching convoy with args: [--socket=/host/var/run/conoy-convoy-nfs.sock daemon --root=/var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08 --drivers=vfs --driver-opts=vfs.path=/var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08/mnt]" 
time="2016-01-15T03:10:41Z" level=debug msg="Creating config at /var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg= driver=vfs driver_opts=map[vfs.path:/var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08/mnt] event=init pkg=daemon reason=prepare root="/var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08" 
time="2016-01-15T03:10:41Z" level=debug msg= driver=vfs event=init pkg=daemon reason=complete 
time="2016-01-15T03:10:41Z" level=debug msg="Registering GET, /info" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg="Registering GET, /uuid" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg="Registering GET, /volumes/list" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg="Registering GET, /volumes/" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg="Registering GET, /snapshots/" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg="Registering GET, /backups/list" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg="Registering GET, /backups/inspect" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg="Registering POST, /volumes/mount" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg="Registering POST, /volumes/umount" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg="Registering POST, /snapshots/create" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg="Registering POST, /backups/create" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg="Registering POST, /volumes/create" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg="Registering DELETE, /volumes/" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg="Registering DELETE, /snapshots/" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg="Registering DELETE, /backups" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg="Registering plugin handler POST, /Plugin.Activate" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg="Registering plugin handler POST, /VolumeDriver.Create" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg="Registering plugin handler POST, /VolumeDriver.Remove" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg="Registering plugin handler POST, /VolumeDriver.Mount" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg="Registering plugin handler POST, /VolumeDriver.Unmount" pkg=daemon 
time="2016-01-15T03:10:41Z" level=debug msg="Registering plugin handler POST, /VolumeDriver.Path" pkg=daemon 
Waiting for metadata.time="2016-01-15T03:17:26Z" level=info msg="Execing [/usr/bin/nsenter --mount=/proc/556/ns/mnt -F -- /var/lib/docker/overlay/4f43da0e5debb337b85611090bbf8645742a4b6266e39f054e61a7f6b572a828/merged/var/lib/rancher/convoy-agent/share-mnt --stage2 /var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08 -- /launch volume-agent-nfs-internal]" 
Registering convoy socket at /var/run/conoy-convoy-nfs.sock
Mounting at: /var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08/mnt
Waiting for metadatatime="2016-01-15T03:31:54Z" level=info msg="Execing [/usr/bin/nsenter --mount=/proc/458/ns/mnt -F -- /var/lib/docker/overlay/4f43da0e5debb337b85611090bbf8645742a4b6266e39f054e61a7f6b572a828/merged/var/lib/rancher/convoy-agent/share-mnt --stage2 /var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08 -- /launch volume-agent-nfs-internal]" 
Registering convoy socket at /var/run/conoy-convoy-nfs.sock
Mounting at: /var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08/mnt
Mounting nfs. Command: mount -t nfs -o nfsvers=4 192.168.10.9:/spud/export/rancher /var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08/mnt
time="2016-01-15T03:31:54Z" level=info msg="Listening for health checks on 0.0.0.0:10241/healthcheck" 
time="2016-01-15T03:31:54Z" level=info msg="Got: root /var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08" 
time="2016-01-15T03:31:54Z" level=info msg="Got: drivers [vfs]" 
time="2016-01-15T03:31:54Z" level=info msg="Got: driver-opts [vfs.path=/var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08/mnt]" 
time="2016-01-15T03:31:54Z" level=info msg="Launching convoy with args: [--socket=/host/var/run/conoy-convoy-nfs.sock daemon --root=/var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08 --drivers=vfs --driver-opts=vfs.path=/var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08/mnt]" 
time="2016-01-15T03:31:54Z" level=debug msg="Found existing config. Ignoring command line opts, loading config from /var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg= driver=vfs driver_opts=map[vfs.path:/var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08/mnt] event=init pkg=daemon reason=prepare root="/var/lib/rancher/convoy/convoy-nfs-02253c61-50ad-440f-97b5-085265688d08" 
time="2016-01-15T03:31:54Z" level=debug msg= driver=vfs event=init pkg=daemon reason=complete 
time="2016-01-15T03:31:54Z" level=debug msg="Registering DELETE, /volumes/" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg="Registering DELETE, /snapshots/" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg="Registering DELETE, /backups" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg="Registering GET, /snapshots/" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg="Registering GET, /backups/list" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg="Registering GET, /backups/inspect" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg="Registering GET, /info" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg="Registering GET, /uuid" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg="Registering GET, /volumes/list" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg="Registering GET, /volumes/" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg="Registering POST, /volumes/umount" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg="Registering POST, /snapshots/create" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg="Registering POST, /backups/create" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg="Registering POST, /volumes/create" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg="Registering POST, /volumes/mount" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg="Registering plugin handler POST, /VolumeDriver.Unmount" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg="Registering plugin handler POST, /VolumeDriver.Path" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg="Registering plugin handler POST, /Plugin.Activate" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg="Registering plugin handler POST, /VolumeDriver.Create" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg="Registering plugin handler POST, /VolumeDriver.Remove" pkg=daemon 
time="2016-01-15T03:31:54Z" level=debug msg="Registering plugin handler POST, /VolumeDriver.Mount" pkg=daemon 

#2

Those logs actually look fine, ie that is what a successful startup would look like.
If you have two rancher hosts on which convoy-nfs is running, then you should have a total of three convoy-agent containers: two running the volume-agent command and one running the storagepool-agent command. Is there anything of interest from the storagepool-agent container’s logs? Do you see any errors in the rancher-server logs?

Do any of the containers in the stack ever restart or does it look like they are staying up?


#3

The containers were staying up. The convoy-nfs container on the one remote host was just stuck in an initializing state, with the above log, never passing its health check. I spent some more time with it last night after posting this and what I ended up doing was:

  1. Remove the Rancher agent from the machine running the Rancher server container.
  2. Purge all other non rancher-server containers from the same host.
  3. Remove the convoy-nfs containers that remained from the remote agent (there is only one remote agent in the homelab presently).
  4. Re-deploy the convoy-nfs stack.

This “just worked”. I was up late hacking and getting loopy and maybe a little too buzzed on caffeine. My “TIL” moment of the night was “don’t run rancher-agent on the same host as rancher-server”. It might have been a gross oversimplification of things. I don’t have enough NUCs here yet to validate my impression. Granted, I’m still early in the Docker/Rancher/RancherOS learning curve.


#4

Hi. I try to deploy convoy-nfs and succefully deploy to 3 of 4 my servers. Trouble server - it’s main server with rancher server container. There no errors in logs, last message: time=“2016-01-22T12:03:30Z” level=warning msg=“Remove previous sockfile at /host/var/run/conoy-convoy-nfs.sock” pkg=daemon
State: “Initializing”. What i do wrong?


#6

Hi everyone,
I as well am having the same issues,
Tried the workaround. And it fails container on the second node is initializing.


#7

Hi !
I’m also facing the same problem here.
I have posted an issue on the convoy github:

https://github.com/rancher/convoy/issues/96

Anyone manage to solve this issue ?

Thanks a lot !


#8

@ClementVidal let’s continue this convo in the github issue.


#9

I had the same problem in my testing environnement and was caused by a missconfiguration of the rancher-agent. I had to add the HOST IP because the cross-host networking wasn’t working.

I also had to add a simlink from /host/var/run/convoy-convoy-nfs.sock to /var/run/convoy/convoy.sock in each container created by the catalogue.