Convoy initializing stuck


#1

I tried to setup convoy with this howto:

So whenever the containers come up the logs look like this:

10/18/2016 10:55:14 PMWaiting for metadata
10/18/2016 10:55:15 PMtime="2016-10-18T20:55:15Z" level=info msg="Execing [/usr/bin/nsenter --mount=/proc/19834/ns/mnt -F -- /var/lib/docker/aufs/mnt/d0d5b99eacca64567698ad55a5087f0a5b9b6cde2638bfb61adc04934c0ba861/var/lib/rancher/convoy-agent/share-mnt --stage2 /var/lib/rancher/convoy/Not found-Not found -- /launch volume-agent-nfs-internal 19834]"
 10/18/2016 10:55:15 PMTimestamp: 2016-10-18 20:55:15.959195851 +0000 UTC
 10/18/2016 10:55:15 PMCode: System error10/18/2016 10:55:15 PM
 10/18/2016 10:55:15 PMMessage: stat found-Not: no such file or directory
 [...]

I’m using:

  • Docker version 1.12.1, build 23cf638 on Debian 8.3
  • Rancher
    v1.2.0-pre4-rc2
    Cattle
    v0.170.1
    User Interface
    v1.2.13
    Rancher Compose
    v0.11.0-rc2

Note the above buggy nsenter command especially “–stage2 /var/lib/rancher/convoy/Not found-Not found --”

On the host you can then find something like this:

$:/var/lib/rancher/convoy# ls -l
total 8
drwxr-xr-x 3 root root 4096 Oct 18 20:55 convoy-nfs-2d88e5cd-7c48-49da-8af0-06a94ae1e757
drwxr-xr-x 2 root root 4096 Oct 18 20:55 Not
$:/var/lib/rancher/convoy#

I’ve installed nfs-server and nfs-common. I tried it with and without nfsvers=4 and i always used the latest version in the catalog (v0.4.0).

My gut-feeling tells me i’m missing something obvious here, any idea? I tried that with docker-nfs but also manual nfs.

thanks

Kim


#2

It seems that this line is created here:

share-mnt-v0.1.2/main.go#L160

The os.Args[1:]… are (eventually) probably specified here:

So my guess is that these curl-commands are failing for some reason. However if i try that manually, it seems to work:

$ docker exec -it 397ae0862554 bash
root@397ae0862554:/# curl -s http://rancher-metadata/2015-07-25/self/stack/name
convoy-nfsroot@397ae0862554:/# curl -s http://rancher-metadata/2015-07-25/self/stack/uuid
294d5998-2611-431b-b23b-16290748f843root@397ae0862554:/#

So maybe there is a wait_for_metadata missing here:


#3

Indeed, the bahaviour changed after adding that line:

10/19/2016 6:32:55 AMtime=“2016-10-19T04:32:55Z” level=debug msg=“Registering POST, /volumes/umount” pkg=daemon10/19/2016 6:32:55 AMtime=“2016-10-19T04:32:55Z” level=debug msg=“Registering POST, /snapshots/create” pkg=daemon10/19/2016 6:32:55 AMtime=“2016-10-19T04:32:55Z” level=debug msg=“Registering POST, /backups/create” pkg=daemon10/19/2016 6:32:55 AMtime=“2016-10-19T04:32:55Z” level=debug msg=“Registering DELETE, /volumes/” pkg=daemon10/19/2016 6:32:55 AMtime=“2016-10-19T04:32:55Z” level=debug msg=“Registering DELETE, /snapshots/” pkg=daemon10/19/2016 6:32:55 AMtime=“2016-10-19T04:32:55Z” level=debug msg=“Registering DELETE, /backups” pkg=daemon10/19/2016 6:32:55 AMtime=“2016-10-19T04:32:55Z” level=debug msg=“Registering plugin handler POST, /VolumeDriver.Path” pkg=daemon10/19/2016 6:32:55 AMtime=“2016-10-19T04:32:55Z” level=debug msg=“Registering plugin handler POST, /VolumeDriver.Get” pkg=daemon10/19/2016 6:32:55 AMtime=“2016-10-19T04:32:55Z” level=debug msg=“Registering plugin handler POST, /VolumeDriver.List” pkg=daemon10/19/2016 6:32:55 AMtime=“2016-10-19T04:32:55Z” level=debug msg=“Registering plugin handler POST, /Plugin.Activate” pkg=daemon10/19/2016 6:32:55 AMtime=“2016-10-19T04:32:55Z” level=debug msg=“Registering plugin handler POST, /VolumeDriver.Create” pkg=daemon10/19/2016 6:32:55 AMtime=“2016-10-19T04:32:55Z” level=debug msg=“Registering plugin handler POST, /VolumeDriver.Remove” pkg=daemon10/19/2016 6:32:55 AMtime=“2016-10-19T04:32:55Z” level=debug msg=“Registering plugin handler POST, /VolumeDriver.Mount” pkg=daemon10/19/2016 6:32:55 AMtime=“2016-10-19T04:32:55Z” level=debug msg=“Registering plugin handler POST, /VolumeDriver.Unmount” pkg=daemon10/19/2016 6:32:55 AMtime=“2016-10-19T04:32:55Z” level=warning msg=“Remove previous sockfile at /host/var/run/convoy-convoy-nfs.sock” pkg=daemon10/19/2016 7:11:13 AMtime=“2016-10-19T05:11:13Z” level=debug msg=“Handle plugin list volume: POST /VolumeDriver.List” pkg=daemon10/19/2016 7:11:13 AMtime=“2016-10-19T05:11:13Z” level=debug msg=“Successfully got volume list for docker.” pkg=daemon10/19/2016 7:11:13 AMtime=“2016-10-19T05:11:13Z” level=debug msg=“Response: {}” pkg=daemon10/19/2016 8:15:10 AMWaiting for metadata10/19/2016 8:15:11 AMtime=“2016-10-19T06:15:11Z” level=info msg="Execing [/usr/bin/nsenter --mount=/proc/15289/ns/mnt -F – /var/lib/docker/aufs/mnt/62fe8e55b40d979387aafc79365027206b1f5e0093d24604ed325889818d4492/var/lib/rancher/convoy-agent/share-mnt --stage2 /var/lib/rancher/convoy/convoy-nfs-294d5998-2611-431b-b23b-16290748f843 – /launch volume-agent-nfs-internal 15289]"10/19/2016 8:15:11 AMRegistering convoy socket at /var/run/convoy-convoy-nfs.sock10/19/2016 8:15:11 AMMounting at: /var/lib/rancher/convoy/convoy-nfs-294d5998-2611-431b-b23b-16290748f843/mnt

Will probably create an issue on github …


#4

convoy-agent doesn’t have a bugtracker. Anyway, PR created:

Sorry, new users can only put 2 links in a post.

github.com/rancher/convoy-agent/pull/35


#5

@k9ert I replied to your PR. We can continue the conversation there. Also, I opened this issue to track: https://github.com/rancher/rancher/issues/6387. Because of the large number of projects/microservices we have, we centralize bug tracking in the rancher repo.