No connection between Host

I have a problem, And I don’t know how I can fix it. I try to install GlusterFs on Host from www.scaleway.com. But I have connections problems between my containers.

I can:

  • See my hosts on my rancher interface
  • Ping hosts from hosts
  • Ping between containers in the same hosts

But I can’t for example ping between 2 Rancher network agents in different Hosts.
I have the same problem using Docker 1.9 or 1.10. But With 1.10 I have a bug with the CPU/Memory usage graph. So I’m currently using Docker 1.9.

I was thinking it was a specific problem with GlusterFs stack. But for example testing Owncloud Stack, when the db is on different host than the webserver. The Webserver can’t see the database. So I think it’s something specific to the network agent, and possibly with the host configuration.

My If config result:

docker0   Link encap:Ethernet  HWaddr 02:42:ab:62:56:9b  
          inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::42:abff:fe62:569b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:116235 errors:0 dropped:0 overruns:0 frame:0
          TX packets:105240 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:8599738 (8.5 MB)  TX bytes:86753953 (86.7 MB)

eth0      Link encap:Ethernet  HWaddr de:19:44:06:f0:03  
          inet addr:10.2.2.135  Bcast:10.255.255.255  Mask:255.255.255.254
          inet6 addr: fe80::dc19:44ff:fe06:f003/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:502778 errors:0 dropped:0 overruns:0 frame:0
          TX packets:493109 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1820554074 (1.8 GB)  TX bytes:126697320 (126.6 MB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:131122 errors:0 dropped:0 overruns:0 frame:0
          TX packets:131122 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1 
          RX bytes:1360754700 (1.3 GB)  TX bytes:1360754700 (1.3 GB)

veth60f2f0a Link encap:Ethernet  HWaddr 92:c3:da:7a:dd:ff  
          inet6 addr: fe80::90c3:daff:fe7a:ddff/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1814 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1720 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:139011 (139.0 KB)  TX bytes:19941488 (19.9 MB)

vethf8d6c6d Link encap:Ethernet  HWaddr 26:50:3d:d6:22:eb  
          inet6 addr: fe80::2450:3dff:fed6:22eb/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:8 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:648 (648.0 B)  TX bytes:648 (648.0 B)

My Ip table result:

iptables -L -n --line-numbers -t nat
Chain PREROUTING (policy ACCEPT)
num  target     prot opt source               destination         
1    CATTLE_PREROUTING  all  --  0.0.0.0/0            0.0.0.0/0           
2    DOCKER     all  --  0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    DOCKER     all  --  0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT)
num  target     prot opt source               destination         
1    CATTLE_POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0           
2    MASQUERADE  all  --  172.17.0.0/16        0.0.0.0/0           
3    MASQUERADE  udp  --  172.17.0.2           172.17.0.2           udp dpt:4500
4    MASQUERADE  udp  --  172.17.0.2           172.17.0.2           udp dpt:500

Chain CATTLE_POSTROUTING (1 references)
num  target     prot opt source               destination         
1    ACCEPT     all  --  10.42.0.0/16         169.254.169.250     
2    MASQUERADE  tcp  --  10.42.0.0/16        !10.42.0.0/16         masq ports: 1024-65535
3    MASQUERADE  udp  --  10.42.0.0/16        !10.42.0.0/16         masq ports: 1024-65535
4    MASQUERADE  all  --  10.42.0.0/16        !10.42.0.0/16        
5    MASQUERADE  tcp  --  172.17.0.0/16        0.0.0.0/0            masq ports: 1024-65535
6    MASQUERADE  udp  --  172.17.0.0/16        0.0.0.0/0            masq ports: 1024-65535

Chain CATTLE_PREROUTING (1 references)
num  target     prot opt source               destination         

Chain DOCKER (2 references)
num  target     prot opt source               destination         
1    DNAT       udp  --  0.0.0.0/0            0.0.0.0/0            udp dpt:4500 to:172.17.0.2:4500
2    DNAT       udp  --  0.0.0.0/0            0.0.0.0/0            udp dpt:500 to:172.17.0.2:500
1 Like

What version of Rancher? What type of Host do you have?

Are the IPs of the host on the rancher UI the correct IPs?

  • I have the latest version of rancher: 63.1
  • I’m Using www.scaleway.com and their VPS solution (Starter VPS)
  • The IPs are corrects.

I’ve been having an almost identical issue, except that I’m using docker-machine and the vmwarevsphere driver to spin up boot2docker hosts, and my Docker version is 1.10.3. Symptomatically, my results are totally consistent with what OP reported.

Here are the steps I’ve taken so far:

  1. Following the suggestions in the Rancher docs, I re-created the Rancher Agents and added the CATTLE_AGENT_IP environment variable.
  2. I manually modified the iptables on the host machines like so:
sudo iptables -t nat -A CATTLE_PREROUTING ! -i docker0 -p udp -m addrtype --dst-type LOCAL -m udp --dport 4500 -j DNAT --to-destination <rancher-agent IP>:4500
sudo iptables -t nat -A CATTLE_PREROUTING ! -i docker0 -p udp -m addrtype --dst-type LOCAL -m udp --dport 500 -j DNAT --to-destination <rancher-agent IP>:500

I keep seeing these entries in my Agents’ /var/log/rancher-net.log:

time="2016-03-21T19:30:46Z" level=info msg="Unknown SPIs: map[string]int{\"0\":1}, Conns: []goStrongswanVici.VpnConnInfo{goStrongswanVici.VpnConnInfo{IkeSa:goStrongswanVici.IkeSa{Uniqueid:\"4\", Version:\"2\", State:\"ESTABLISHED\", Local_host:\"172.17.0.2\", Local_id:\"172.17.0.2\", Remote_host:\"192.168.16.225\", Remote_id:\"172.17.0.2\", Remote_xauth_id:\"\", Initiator:\"\", Initiator_spi:\"6fe5dd94802b53ab\", Responder_spi:\"70d43ab584fc4bdb\", Encr_alg:\"AES_CBC\", Encr_keysize:\"128\", Integ_alg:\"HMAC_SHA1_96\", Integ_keysize:\"\", Prf_alg:\"PRF_HMAC_SHA1\", Dh_group:\"MODP_2048\", Established:\"9848\", Rekey_time:\"4033\", Reauth_time:\"\", Child_sas:map[string]goStrongswanVici.Child_sas{}}, Child_sas:goStrongswanVici.Child_sas{Reqid:\"\", State:\"\", Mode:\"\", Protocol:\"\", Encap:\"\", Spi_in:\"\", Spi_out:\"\", Cpi_in:\"\", Cpi_out:\"\", Encr_alg:\"\", Encr_keysize:\"\", Integ_alg:\"\", Integ_keysize:\"\", Prf_alg:\"\", Dh_group:\"\", Esn:\"\", Bytes_in:\"\", Packets_in:\"\", Use_in:\"\", Bytes_out:\"\", Packets_out:\"\", Use_out:\"\", Rekey_time:\"\", Life_time:\"\", Install_time:\"\"}, IkeSaName:\"conn-192.168.16.225\", ChildSaName:\"\"}, goStrongswanVici.VpnConnInfo{IkeSa:goStrongswanVici.IkeSa{Uniqueid:\"3\", Version:\"2\", State:\"ESTABLISHED\", Local_host:\"172.17.0.2\", Local_id:\"172.17.0.2\", Remote_host:\"192.168.16.224\", Remote_id:\"172.17.0.2\", Remote_xauth_id:\"\", Initiator:\"\", Initiator_spi:\"3863b4674e5b3cb6\", Responder_spi:\"f6b23705a5048211\", Encr_alg:\"AES_CBC\", Encr_keysize:\"128\", Integ_alg:\"HMAC_SHA1_96\", Integ_keysize:\"\", Prf_alg:\"PRF_HMAC_SHA1\", Dh_group:\"MODP_2048\", Established:\"9859\", Rekey_time:\"4481\", Reauth_time:\"\", Child_sas:map[string]goStrongswanVici.Child_sas{}}, Child_sas:goStrongswanVici.Child_sas{Reqid:\"\", State:\"\", Mode:\"\", Protocol:\"\", Encap:\"\", Spi_in:\"\", Spi_out:\"\", Cpi_in:\"\", Cpi_out:\"\", Encr_alg:\"\", Encr_keysize:\"\", Integ_alg:\"\", Integ_keysize:\"\", Prf_alg:\"\", Dh_group:\"\", Esn:\"\", Bytes_in:\"\", Packets_in:\"\", Use_in:\"\", Bytes_out:\"\", Packets_out:\"\", Use_out:\"\", Rekey_time:\"\", Life_time:\"\", Install_time:\"\"}, IkeSaName:\"conn-192.168.16.224\", ChildSaName:\"\"}, goStrongswanVici.VpnConnInfo{IkeSa:goStrongswanVici.IkeSa{Uniqueid:\"1\", Version:\"2\", State:\"ESTABLISHED\", Local_host:\"172.17.0.2\", Local_id:\"172.17.0.2\", Remote_host:\"192.168.16.224\", Remote_id:\"172.17.0.2\", Remote_xauth_id:\"\", Initiator:\"yes\", Initiator_spi:\"81d2d18058f63898\", Responder_spi:\"300d6004536930ee\", Encr_alg:\"AES_CBC\", Encr_keysize:\"128\", Integ_alg:\"HMAC_SHA1_96\", Integ_keysize:\"\", Prf_alg:\"PRF_HMAC_SHA1\", Dh_group:\"MODP_2048\", Established:\"9863\", Rekey_time:\"3714\", Reauth_time:\"\", Child_sas:map[string]goStrongswanVici.Child_sas{}}, Child_sas:goStrongswanVici.Child_sas{Reqid:\"\", State:\"\", Mode:\"\", Protocol:\"\", Encap:\"\", Spi_in:\"\", Spi_out:\"\", Cpi_in:\"\", Cpi_out:\"\", Encr_alg:\"\", Encr_keysize:\"\", Integ_alg:\"\", Integ_keysize:\"\", Prf_alg:\"\", Dh_group:\"\", Esn:\"\", Bytes_in:\"\", Packets_in:\"\", Use_in:\"\", Bytes_out:\"\", Packets_out:\"\", Use_out:\"\", Rekey_time:\"\", Life_time:\"\", Install_time:\"\"}, IkeSaName:\"conn-192.168.16.224\", ChildSaName:\"\"}}"

One other bit of info:

docker@dev-dockerhost02:~$ sudo netstat -lu
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
udp        0      0 :::4500                 :::*
udp        0      0 :::500                  :::*

Ok, So I tested the RC. One difference with the last version :
I have to add the CATTLE_AGENT_IP when I launch the IP. If I don’t do that is picking the docker IP : 172.17.0.1

The second point: I have 2 errors message on the agent:

23/03/2016 12:44:36INFO: Running /var/lib/cattle/download/ipsec/ipsec-1-bd2fca96de41d6264c4160a4099fb092f5f057e4960a9265420c85c497503d37/apply.sh
23/03/2016 12:44:36RTNETLINK answers: No such file or directory

And:

23/03/2016 12:44:37INFO: Running /var/lib/cattle/download/iptables/iptables-2-bf64a7ed197a703a43d7e7f0579eec66fdcb1530ad911e47640f6a12688d7853/apply.sh
23/03/2016 12:44:37SIOCSARP: Invalid argument
23/03/2016 12:44:37arp: cannot set entry on line 2 of etherfile content-home/etc/cattle/ethers !

Also at the end I have
23/03/2016 12:44:38Starting monit daemon with http interface at [localhost:2812]
I’m not sure "localhost is ok here ?

The complete Logs for the agent:

23/03/2016 12:44:33INFO: Downloading http://10.2.13.131:8080/v1//configcontent//rancher-net current=
23/03/2016 12:44:34INFO: Running /var/lib/cattle/download/rancher-net/1-09b6a579e702e4782916162e7861e8a3c9fbe3ee018a5cb2e1e961dbd4ac56dc/apply.sh
23/03/2016 12:44:34Extracting strongswan/strongswan-5.3.5-1.tar.gz
23/03/2016 12:44:35INFO: HOME -> ./
23/03/2016 12:44:35INFO: HOME -> ./bin/
23/03/2016 12:44:35INFO: HOME -> ./bin/rancher-net
23/03/2016 12:44:35INFO: ROOT -> ./
23/03/2016 12:44:35INFO: ROOT -> ./usr/
23/03/2016 12:44:35INFO: ROOT -> ./usr/local/
23/03/2016 12:44:35INFO: ROOT -> ./usr/local/etc/
23/03/2016 12:44:35INFO: ROOT -> ./usr/local/etc/strongswan.d/
23/03/2016 12:44:35INFO: ROOT -> ./usr/local/etc/strongswan.d/charon-log.conf
23/03/2016 12:44:35INFO: ROOT -> ./etc/
23/03/2016 12:44:35INFO: ROOT -> ./etc/init.d/
23/03/2016 12:44:35INFO: ROOT -> ./etc/init.d/rancher-net
23/03/2016 12:44:35INFO: ROOT -> ./etc/init.d/charon
23/03/2016 12:44:35INFO: ROOT -> ./etc/monit/
23/03/2016 12:44:35INFO: ROOT -> ./etc/monit/conf.d/
23/03/2016 12:44:35INFO: ROOT -> ./etc/monit/conf.d/rancher-net
23/03/2016 12:44:35INFO: ROOT -> ./etc/monit/conf.d/charon
23/03/2016 12:44:35INFO: ROOT -> ./etc/logrotate.d/
23/03/2016 12:44:35INFO: ROOT -> ./etc/logrotate.d/charon
23/03/2016 12:44:35INFO: Sending rancher-net applied 1-09b6a579e702e4782916162e7861e8a3c9fbe3ee018a5cb2e1e961dbd4ac56dc
23/03/2016 12:44:35INFO: Sending node-services applied 1-9e3dcab12585149a65d818082279708965cd826df1937a55b24f48428f8f0c06c6aa4e3a711905d3bc5141dfee7fa331
23/03/2016 12:44:35INFO: Getting ipsec
23/03/2016 12:44:35INFO: Updating ipsec
23/03/2016 12:44:35INFO: Downloading http://10.2.13.131:8080/v1//configcontent//ipsec current=
23/03/2016 12:44:36INFO: Running /var/lib/cattle/download/ipsec/ipsec-1-bd2fca96de41d6264c4160a4099fb092f5f057e4960a9265420c85c497503d37/apply.sh
23/03/2016 12:44:36RTNETLINK answers: No such file or directory
23/03/2016 12:44:36INFO: HOME -> ./
23/03/2016 12:44:36INFO: HOME -> ./etc/
23/03/2016 12:44:36INFO: HOME -> ./etc/cattle/
23/03/2016 12:44:36INFO: HOME -> ./etc/cattle/ipsec/
23/03/2016 12:44:36INFO: HOME -> ./etc/cattle/ipsec/psk.txt
23/03/2016 12:44:36monit: generated unique Monit id e9f619aee89f1be6f15f461fc4a0ac5f and stored to '/var/lib/monit/id'
23/03/2016 12:44:36Reinitializing monit daemon
23/03/2016 12:44:36monit: No daemon process found
23/03/2016 12:44:36INFO: Sending ipsec applied 1-bd2fca96de41d6264c4160a4099fb092f5f057e4960a9265420c85c497503d37
23/03/2016 12:44:36INFO: Getting ipsec-hosts
23/03/2016 12:44:36INFO: Updating ipsec-hosts
23/03/2016 12:44:36INFO: Downloading http://10.2.13.131:8080/v1//configcontent//ipsec-hosts current=
23/03/2016 12:44:36INFO: Running /var/lib/cattle/download/ipsec-hosts/ipsec-hosts-2-d9e643568adffad1bf56033ff5cba1231b2269bdf1ad1e92181810fc6769449c/apply.sh
23/03/2016 12:44:36INFO: HOME -> ./
23/03/2016 12:44:36INFO: HOME -> ./etc/
23/03/2016 12:44:36INFO: HOME -> ./etc/cattle/
23/03/2016 12:44:36INFO: HOME -> ./etc/cattle/ipsec/
23/03/2016 12:44:36INFO: HOME -> ./etc/cattle/ipsec/config.json
23/03/2016 12:44:36INFO: Sending ipsec-hosts applied 2-d9e643568adffad1bf56033ff5cba1231b2269bdf1ad1e92181810fc6769449c
23/03/2016 12:44:36INFO: Getting hosts
23/03/2016 12:44:36INFO: Updating hosts
23/03/2016 12:44:36INFO: Downloading http://10.2.13.131:8080/v1//configcontent//hosts current=
23/03/2016 12:44:37INFO: Running /var/lib/cattle/download/hosts/hosts-2-df032449a7f08eeb26687a2e6ec2649d7fa7da0a45f897a92f55dc1b1d2451c1/apply.sh
23/03/2016 12:44:37INFO: HOME -> ./
23/03/2016 12:44:37INFO: HOME -> ./etc/
23/03/2016 12:44:37INFO: HOME -> ./etc/cattle/
23/03/2016 12:44:37INFO: HOME -> ./etc/cattle/dns/
23/03/2016 12:44:37INFO: HOME -> ./etc/cattle/dns/answers.json
23/03/2016 12:44:37INFO: HOME -> ./etc/cattle/metadata/
23/03/2016 12:44:37INFO: HOME -> ./etc/cattle/metadata/answers.yml
23/03/2016 12:44:37OKOKINFO: Sending hosts applied 2-df032449a7f08eeb26687a2e6ec2649d7fa7da0a45f897a92f55dc1b1d2451c1
23/03/2016 12:44:37INFO: Getting iptables
23/03/2016 12:44:37INFO: Updating iptables
23/03/2016 12:44:37INFO: Downloading http://10.2.13.131:8080/v1//configcontent//iptables current=
23/03/2016 12:44:37INFO: Running /var/lib/cattle/download/iptables/iptables-2-bf64a7ed197a703a43d7e7f0579eec66fdcb1530ad911e47640f6a12688d7853/apply.sh
23/03/2016 12:44:37SIOCSARP: Invalid argument
23/03/2016 12:44:37arp: cannot set entry on line 2 of etherfile content-home/etc/cattle/ethers !
23/03/2016 12:44:37INFO: HOME -> ./
23/03/2016 12:44:37INFO: HOME -> ./etc/
23/03/2016 12:44:37INFO: HOME -> ./etc/cattle/
23/03/2016 12:44:37INFO: HOME -> ./etc/cattle/iptables-save
23/03/2016 12:44:37INFO: HOME -> ./etc/cattle/ethers
23/03/2016 12:44:37INFO: Sending iptables applied 2-bf64a7ed197a703a43d7e7f0579eec66fdcb1530ad911e47640f6a12688d7853
23/03/2016 12:44:38INFO: Getting healthcheck
23/03/2016 12:44:38INFO: Updating healthcheck
23/03/2016 12:44:38INFO: Downloading http://10.2.13.131:8080/v1//configcontent//healthcheck current=
23/03/2016 12:44:38INFO: Running /var/lib/cattle/download/healthcheck/healthcheck-2-65d5d9de1a6d13e42cd660a6a178baf58129b2dec06f43aa69e0603c72ac427e/apply.sh
23/03/2016 12:44:38INFO: ROOT -> ./
23/03/2016 12:44:38INFO: ROOT -> ./etc/
23/03/2016 12:44:38INFO: ROOT -> ./etc/default/
23/03/2016 12:44:38INFO: ROOT -> ./etc/default/haproxy-monitor
23/03/2016 12:44:38INFO: ROOT -> ./etc/default/haproxy
23/03/2016 12:44:38INFO: ROOT -> ./etc/init.d/
23/03/2016 12:44:38INFO: ROOT -> ./etc/init.d/haproxy-monitor
23/03/2016 12:44:38INFO: ROOT -> ./etc/monit/
23/03/2016 12:44:38INFO: ROOT -> ./etc/monit/conf.d/
23/03/2016 12:44:38INFO: ROOT -> ./etc/monit/conf.d/haproxy-monitor
23/03/2016 12:44:38INFO: ROOT -> ./etc/monit/conf.d/healthcheck
23/03/2016 12:44:38INFO: ROOT -> ./etc/healthcheck/
23/03/2016 12:44:38INFO: ROOT -> ./etc/healthcheck/healthcheck.cfg
23/03/2016 12:44:38INFO: Sending healthcheck applied 2-65d5d9de1a6d13e42cd660a6a178baf58129b2dec06f43aa69e0603c72ac427e
23/03/2016 12:44:38INFO: HOME -> ./
23/03/2016 12:44:38INFO: HOME -> ./etc/
23/03/2016 12:44:38INFO: HOME -> ./etc/cattle/
23/03/2016 12:44:38INFO: HOME -> ./etc/cattle/startup-env
23/03/2016 12:44:38INFO: ROOT -> ./
23/03/2016 12:44:38INFO: ROOT -> ./etc/
23/03/2016 12:44:38INFO: ROOT -> ./etc/init.d/
23/03/2016 12:44:38INFO: ROOT -> ./etc/init.d/agent-instance-startup
23/03/2016 12:44:38INFO: Sending agent-instance-startup applied 2-6ea3be84571dd17da521e4f632e76489f2b0340a9c58e279af8e9297d448d3b5
23/03/2016 12:44:38Starting monit daemon with http interface at [localhost:2812]

Still the same with the RC4 :cry:

My iptables are a little bit different:

Chain PREROUTING (policy ACCEPT)
num  target     prot opt source               destination         
1    CATTLE_PREROUTING  all  --  0.0.0.0/0            0.0.0.0/0           
2    DOCKER     all  --  0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    DOCKER     all  --  0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT)
num  target     prot opt source               destination         
1    CATTLE_POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0           
2    MASQUERADE  all  --  172.17.0.0/16        0.0.0.0/0           
3    MASQUERADE  udp  --  172.17.0.2           172.17.0.2           udp dpt:4500
4    MASQUERADE  udp  --  172.17.0.2           172.17.0.2           udp dpt:500

Chain CATTLE_POSTROUTING (1 references)
num  target     prot opt source               destination         
1    ACCEPT     all  --  10.42.0.0/16         169.254.169.250     
2    MASQUERADE  tcp  --  10.42.0.0/16        !10.42.0.0/16         masq ports: 1024-65535
3    MASQUERADE  udp  --  10.42.0.0/16        !10.42.0.0/16         masq ports: 1024-65535
4    MASQUERADE  all  --  10.42.0.0/16        !10.42.0.0/16        
5    SNAT       all  -- !10.42.0.0/16         169.254.169.250      mark match 0x14902 to:10.42.84.226
6    SNAT       all  -- !10.42.0.0/16         169.254.169.250      mark match 0x1877e to:10.42.100.222
7    MASQUERADE  tcp  --  172.17.0.0/16        0.0.0.0/0            masq ports: 1024-65535
8    MASQUERADE  udp  --  172.17.0.0/16        0.0.0.0/0            masq ports: 1024-65535

Chain CATTLE_PREROUTING (1 references)
num  target     prot opt source               destination         
1    MARK       all  -- !10.42.0.0/16         169.254.169.250      MAC 02:58:49:4B:8A:9A MARK set 0x14902
2    MARK       all  -- !10.42.0.0/16         169.254.169.250      MAC 02:58:49:5E:D1:04 MARK set 0x1877e

Chain DOCKER (2 references)
num  target     prot opt source               destination         
1    RETURN     all  --  0.0.0.0/0            0.0.0.0/0           
2    DNAT       udp  --  0.0.0.0/0            0.0.0.0/0            udp dpt:4500 to:172.17.0.2:4500
3    DNAT       udp  --  0.0.0.0/0            0.0.0.0/0            udp dpt:500 to:172.17.0.2:500

I’m not sure from where the 169.254.169.250 is coming.

It’s possible to have some help here ?

It’s possibly some problem from the Scaleway Docker Image. But not sure.
Here an opened bug: https://github.com/scaleway-community/scaleway-docker/issues/58
If someone from Rancher can give a hand it could be really nice.

Thx.

Hi,

I work at Scaleway and I built the initial Scaleway Docker image.

I can reproduce the issue but I can’t figure out what is blocking, I’m available if you need help here or on IRC: @moul

Can you provide the logs for networking?

http://docs.rancher.com/rancher/faqs/#where-can-i-find-logs-for-networking

Have you also looked at all the cross host communication troubleshooting faqs?

http://docs.rancher.com/rancher/faqs/#cross-host-communication

Rancher-net.log:

time="2016-04-05T11:19:45Z" level=info msg="Writing pid 829 to /var/run/rancher-net.pid"
time="2016-04-05T11:19:45Z" level=info msg="Charon running PID: 809"
time="2016-04-05T11:19:45Z" level=info msg=Reconfiguring
time="2016-04-05T11:19:45Z" level=info msg="Listening on :8111"
time="2016-04-05T11:19:45Z" level=info msg="Loaded pre-shared key for "
time="2016-04-05T11:19:45Z" level=info msg="Listening for ARP requests on eth0"
time="2016-04-05T11:19:45Z" level=info msg="Loaded pre-shared key for 172.17.0.2"
time="2016-04-05T11:19:50Z" level=info msg="Unknown SPIs: map[string]int{\"2a\":1}, Conns: []goStrongswanVici.VpnConnInfo{}"
time="2016-04-05T11:19:55Z" level=info msg="Unknown SPIs: map[string]int{\"2a\":2}, Conns: []goStrongswanVici.VpnConnInfo{}"
time="2016-04-05T11:20:00Z" level=info msg="Unknown SPIs: map[string]int{\"2a\":3}, Conns: []goStrongswanVici.VpnConnInfo{}"
time="2016-04-05T11:20:05Z" level=info msg="Unknown SPIs: map[string]int{\"2a\":4}, Conns: []goStrongswanVici.VpnConnInfo{}"
time="2016-04-05T11:20:10Z" level=info msg="Deleting unknown SPI 2a: netlink.XfrmState{Dst:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0x1, 0x1, 0x1, 0x1}, Src:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
 0x0, 0x0, 0xff, 0xff, 0x1, 0x1, 0x1, 0x1}, Proto:0x32, Mode:0x1, Spi:42, Reqid:0, ReplayWindow:0, Auth:(*netlink.XfrmStateAlgo)(nil), Crypt:(*netlink.XfrmStateAlgo)(nil), Encap:(*netlink.XfrmStateEncap)(nil)}"
time="2016-04-05T11:21:24Z" level=info msg=Reconfiguring
time="2016-04-05T11:21:25Z" level=info msg=Reconfiguring
time="2016-04-05T11:21:25Z" level=info msg="Loaded pre-shared key for 10.2.16.215"
time="2016-04-05T11:21:25Z" level=info msg="Loaded connection: conn-10.2.16.215"
time="2016-04-05T11:21:25Z" level=info msg="Loaded pre-shared key for 10.42.28.139"
time="2016-04-05T11:21:25Z" level=info msg="Added policy: {Dst:10.42.0.0/16 Src:10.42.142.205/32 Dir:dir fwd Priority:0 Index:0 Tmpls:[{Dst:172.17.0.2 Src:10.2.16.215 Proto:esp Mode:tunnel Reqid:1234}]}"
time="2016-04-05T11:21:25Z" level=info msg="Added policy: {Dst:10.42.28.139/32 Src:10.42.0.0/16 Dir:dir out Priority:0 Index:0 Tmpls:[{Dst:10.2.16.215 Src:172.17.0.2 Proto:esp Mode:tunnel Reqid:1234}]}"
time="2016-04-05T11:21:25Z" level=info msg="Added policy: {Dst:10.42.0.0/16 Src:10.42.28.139/32 Dir:dir in Priority:0 Index:0 Tmpls:[{Dst:172.17.0.2 Src:10.2.16.215 Proto:esp Mode:tunnel Reqid:1234}]}"
time="2016-04-05T11:21:25Z" level=info msg="Added policy: {Dst:10.42.0.0/16 Src:10.42.28.139/32 Dir:dir fwd Priority:0 Index:0 Tmpls:[{Dst:172.17.0.2 Src:10.2.16.215 Proto:esp Mode:tunnel Reqid:1234}]}"
time="2016-04-05T11:21:25Z" level=info msg="Added policy: {Dst:10.42.142.205/32 Src:10.42.0.0/16 Dir:dir out Priority:0 Index:0 Tmpls:[{Dst:10.2.16.215 Src:172.17.0.2 Proto:esp Mode:tunnel Reqid:1234}]}"
time="2016-04-05T11:21:25Z" level=info msg="Added policy: {Dst:10.42.0.0/16 Src:10.42.142.205/32 Dir:dir in Priority:0 Index:0 Tmpls:[{Dst:172.17.0.2 Src:10.2.16.215 Proto:esp Mode:tunnel Reqid:1234}]}"

charon.log:

Apr  5 11:19:44 00[DMN] Starting IKE charon daemon (strongSwan 5.3.5, Linux 4.4.6-docker-3, x86_64)
Apr  5 11:19:44 00[KNL] received netlink error: Address family not supported by protocol (97)
Apr  5 11:19:44 00[KNL] unable to create IPv6 routing table rule
Apr  5 11:19:44 00[CFG] loading ca certificates from '/usr/local/etc/ipsec.d/cacerts'
Apr  5 11:19:44 00[LIB] opening directory '/usr/local/etc/ipsec.d/cacerts' failed: No such file or directory
Apr  5 11:19:44 00[CFG]   reading directory failed
Apr  5 11:19:44 00[CFG] loading aa certificates from '/usr/local/etc/ipsec.d/aacerts'
Apr  5 11:19:44 00[LIB] opening directory '/usr/local/etc/ipsec.d/aacerts' failed: No such file or directory
Apr  5 11:19:44 00[CFG]   reading directory failed
Apr  5 11:19:44 00[CFG] loading ocsp signer certificates from '/usr/local/etc/ipsec.d/ocspcerts'
Apr  5 11:19:44 00[LIB] opening directory '/usr/local/etc/ipsec.d/ocspcerts' failed: No such file or directory
Apr  5 11:19:44 00[CFG]   reading directory failed
Apr  5 11:19:44 00[CFG] loading attribute certificates from '/usr/local/etc/ipsec.d/acerts'
Apr  5 11:19:44 00[LIB] opening directory '/usr/local/etc/ipsec.d/acerts' failed: No such file or directory
Apr  5 11:19:44 00[CFG]   reading directory failed
Apr  5 11:19:44 00[CFG] loading crls from '/usr/local/etc/ipsec.d/crls'
Apr  5 11:19:44 00[LIB] opening directory '/usr/local/etc/ipsec.d/crls' failed: No such file or directory
Apr  5 11:19:44 00[CFG]   reading directory failed
Apr  5 11:19:44 00[CFG] loading secrets from '/usr/local/etc/ipsec.secrets'
Apr  5 11:19:44 00[CFG] opening secrets file '/usr/local/etc/ipsec.secrets' failed: No such file or directory
Apr  5 11:19:44 00[JOB] spawning 16 worker threads
Apr  5 11:19:45 15[CFG] loaded IKE shared key for: '%any'
Apr  5 11:19:45 14[CFG] loaded IKE shared key for: '172.17.0.2'
Apr  5 11:21:25 04[CFG] loaded IKE shared key for: '10.2.16.215'
Apr  5 11:21:25 15[CFG] added vici connection: conn-10.2.16.215
Apr  5 11:21:25 15[CFG] initiating 'child-10.2.16.215'
Apr  5 11:21:25 15[IKE] <conn-10.2.16.215|1> initiating IKE_SA conn-10.2.16.215[1] to 10.2.16.215
Apr  5 11:21:25 15[ENC] <conn-10.2.16.215|1> generating IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(HASH_ALG) ]
Apr  5 11:21:25 15[NET] <conn-10.2.16.215|1> sending packet: from 172.17.0.2[500] to 10.2.16.215[500] (448 bytes)
Apr  5 11:21:25 07[CFG] loaded IKE shared key for: '10.42.28.139'
Apr  5 11:21:29 14[IKE] <conn-10.2.16.215|1> retransmit 1 of request with message ID 0
Apr  5 11:21:29 14[NET] <conn-10.2.16.215|1> sending packet: from 172.17.0.2[500] to 10.2.16.215[500] (448 bytes)
Apr  5 11:21:37 08[IKE] <conn-10.2.16.215|1> retransmit 2 of request with message ID 0
Apr  5 11:21:37 08[NET] <conn-10.2.16.215|1> sending packet: from 172.17.0.2[500] to 10.2.16.215[500] (448 bytes)
Apr  5 11:21:50 08[IKE] <conn-10.2.16.215|1> retransmit 3 of request with message ID 0
Apr  5 11:21:50 08[NET] <conn-10.2.16.215|1> sending packet: from 172.17.0.2[500] to 10.2.16.215[500] (448 bytes)
Apr  5 11:22:13 10[IKE] <conn-10.2.16.215|1> retransmit 4 of request with message ID 0
Apr  5 11:22:13 10[NET] <conn-10.2.16.215|1> sending packet: from 172.17.0.2[500] to 10.2.16.215[500] (448 bytes)
Apr  5 11:22:19 07[NET] <2> received packet: from 10.2.16.215[1] to 172.17.0.2[500] (448 bytes)
Apr  5 11:22:19 07[ENC] <2> parsed IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(HASH_ALG) ]
Apr  5 11:22:19 07[IKE] <2> 10.2.16.215 is initiating an IKE_SA
Apr  5 11:22:19 07[IKE] <2> local host is behind NAT, sending keep alives
Apr  5 11:22:19 07[IKE] <2> remote host is behind NAT
Apr  5 11:22:19 07[ENC] <2> generating IKE_SA_INIT response 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(HASH_ALG) N(MULT_AUTH) ]
Apr  5 11:22:19 07[NET] <2> sending packet: from 172.17.0.2[500] to 10.2.16.215[1] (456 bytes)
Apr  5 11:22:19 15[NET] <2> received packet: from 10.2.16.215[4500] to 172.17.0.2[4500] (236 bytes)
Apr  5 11:22:19 15[ENC] <2> parsed IKE_AUTH request 1 [ IDi AUTH SA TSi TSr N(MOBIKE_SUP) N(ADD_4_ADDR) N(ADD_4_ADDR) N(MULT_AUTH) N(EAP_ONLY) ]
Apr  5 11:22:19 15[CFG] <2> looking for peer configs matching 172.17.0.2[%any]...10.2.16.215[172.17.0.2]
Apr  5 11:22:19 15[CFG] <conn-10.2.16.215|2> selected peer config 'conn-10.2.16.215'
Apr  5 11:22:19 15[IKE] <conn-10.2.16.215|2> authentication of '172.17.0.2' with pre-shared key successful
Apr  5 11:22:19 15[IKE] <conn-10.2.16.215|2> peer supports MOBIKE
Apr  5 11:22:19 15[CFG] <conn-10.2.16.215|2> no IDr configured, fall back on IP address
Apr  5 11:22:19 15[IKE] <conn-10.2.16.215|2> authentication of '172.17.0.2' (myself) with pre-shared key
Apr  5 11:22:19 15[IKE] <conn-10.2.16.215|2> IKE_SA conn-10.2.16.215[2] established between 172.17.0.2[172.17.0.2]...10.2.16.215[172.17.0.2]
Apr  5 11:22:19 15[IKE] <conn-10.2.16.215|2> scheduling rekeying in 13774s
Apr  5 11:22:19 15[IKE] <conn-10.2.16.215|2> maximum IKE_SA lifetime 15214s
Apr  5 11:22:19 15[KNL] <conn-10.2.16.215|2> received netlink error: No such file or directory (2)
Apr  5 11:22:19 15[KNL] <conn-10.2.16.215|2> unable to add SAD entry with SPI c79f5025
Apr  5 11:22:19 15[KNL] <conn-10.2.16.215|2> received netlink error: No such file or directory (2)
Apr  5 11:22:19 15[KNL] <conn-10.2.16.215|2> unable to add SAD entry with SPI cc0cf3a5
Apr  5 11:22:19 15[IKE] <conn-10.2.16.215|2> unable to install inbound and outbound IPsec SA (SAD) in kernel
Apr  5 11:22:19 15[IKE] <conn-10.2.16.215|2> failed to establish CHILD_SA, keeping IKE_SA
Apr  5 11:22:19 15[ENC] <conn-10.2.16.215|2> generating IKE_AUTH response 1 [ IDr AUTH N(MOBIKE_SUP) N(ADD_4_ADDR) N(ADD_4_ADDR) N(NO_PROP) ]
Apr  5 11:22:19 15[NET] <conn-10.2.16.215|2> sending packet: from 172.17.0.2[4500] to 10.2.16.215[4500] (156 bytes)
Apr  5 11:22:39 05[IKE] <conn-10.2.16.215|2> sending keep alive to 10.2.16.215[4500]
Apr  5 11:22:55 04[IKE] <conn-10.2.16.215|1> retransmit 5 of request with message ID 0
Apr  5 11:22:55 04[NET] <conn-10.2.16.215|1> sending packet: from 172.17.0.2[500] to 10.2.16.215[500] (448 bytes)
Apr  5 11:22:55 06[NET] <conn-10.2.16.215|1> received packet: from 10.2.16.215[500] to 172.17.0.2[500] (456 bytes)
Apr  5 11:22:55 06[ENC] <conn-10.2.16.215|1> parsed IKE_SA_INIT response 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(HASH_ALG) N(MULT_AUTH) ]
Apr  5 11:22:55 06[IKE] <conn-10.2.16.215|1> local host is behind NAT, sending keep alives
Apr  5 11:22:55 06[IKE] <conn-10.2.16.215|1> remote host is behind NAT
Apr  5 11:22:55 06[CFG] <conn-10.2.16.215|1> no IDi configured, fall back on IP address
Apr  5 11:22:55 06[IKE] <conn-10.2.16.215|1> authentication of '172.17.0.2' (myself) with pre-shared key
Apr  5 11:22:55 06[IKE] <conn-10.2.16.215|1> establishing CHILD_SA child-10.2.16.215
Apr  5 11:22:55 06[ENC] <conn-10.2.16.215|1> generating IKE_AUTH request 1 [ IDi AUTH SA TSi TSr N(MOBIKE_SUP) N(ADD_4_ADDR) N(ADD_4_ADDR) N(MULT_AUTH) N(EAP_ONLY) ]
Apr  5 11:22:55 06[NET] <conn-10.2.16.215|1> sending packet: from 172.17.0.2[4500] to 10.2.16.215[4500] (236 bytes)
Apr  5 11:22:55 15[NET] <conn-10.2.16.215|1> received packet: from 10.2.16.215[4500] to 172.17.0.2[4500] (156 bytes)
Apr  5 11:22:55 15[ENC] <conn-10.2.16.215|1> parsed IKE_AUTH response 1 [ IDr AUTH N(MOBIKE_SUP) N(ADD_4_ADDR) N(ADD_4_ADDR) N(NO_PROP) ]
Apr  5 11:22:55 15[IKE] <conn-10.2.16.215|1> authentication of '172.17.0.2' with pre-shared key successful
Apr  5 11:22:55 15[IKE] <conn-10.2.16.215|1> IKE_SA conn-10.2.16.215[1] established between 172.17.0.2[172.17.0.2]...10.2.16.215[172.17.0.2]
Apr  5 11:22:55 15[IKE] <conn-10.2.16.215|1> scheduling rekeying in 14398s
Apr  5 11:22:55 15[IKE] <conn-10.2.16.215|1> maximum IKE_SA lifetime 15838s
Apr  5 11:22:55 15[IKE] <conn-10.2.16.215|1> received NO_PROPOSAL_CHOSEN notify, no CHILD_SA built
Apr  5 11:22:55 15[IKE] <conn-10.2.16.215|1> failed to establish CHILD_SA, keeping IKE_SA
Apr  5 11:22:55 15[IKE] <conn-10.2.16.215|1> peer supports MOBIKE
Apr  5 11:22:59 09[IKE] <conn-10.2.16.215|2> sending keep alive to 10.2.16.215[4500]
Apr  5 11:23:19 04[IKE] <conn-10.2.16.215|1> sending keep alive to 10.2.16.215[4500]
Apr  5 11:23:19 15[IKE] <conn-10.2.16.215|2> sending keep alive to 10.2.16.215[4500]
Apr  5 11:23:39 07[IKE] <conn-10.2.16.215|1> sending keep alive to 10.2.16.215[4500]
Apr  5 11:23:39 05[IKE] <conn-10.2.16.215|2> sending keep alive to 10.2.16.215[4500]
Apr  5 11:23:55 09[CFG] loaded IKE shared key for: '10.2.13.25'
Apr  5 11:23:55 08[CFG] added vici connection: conn-10.2.13.25
Apr  5 11:23:55 08[CFG] initiating 'child-10.2.13.25'
Apr  5 11:23:55 08[IKE] <conn-10.2.13.25|3> initiating IKE_SA conn-10.2.13.25[3] to 10.2.13.25
Apr  5 11:23:55 08[ENC] <conn-10.2.13.25|3> generating IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(HASH_ALG) ]
Apr  5 11:23:55 08[NET] <conn-10.2.13.25|3> sending packet: from 172.17.0.2[500] to 10.2.13.25[500] (448 bytes)
Apr  5 11:23:55 15[CFG] loaded IKE shared key for: '10.42.75.236'
Apr  5 11:23:59 06[IKE] <conn-10.2.16.215|1> sending keep alive to 10.2.16.215[4500]
Apr  5 11:23:59 07[IKE] <conn-10.2.16.215|2> sending keep alive to 10.2.16.215[4500]
Apr  5 11:23:59 08[IKE] <conn-10.2.13.25|3> retransmit 1 of request with message ID 0
Apr  5 11:23:59 08[NET] <conn-10.2.13.25|3> sending packet: from 172.17.0.2[500] to 10.2.13.25[500] (448 bytes)
Apr  5 11:24:00 06[KNL] creating acquire job for policy 10.42.51.38/32[tcp/44262] === 10.42.54.116/32[tcp/24007] with reqid {1234}
Apr  5 11:24:00 06[CFG] trap not found, unable to acquire reqid 1234
Apr  5 11:24:01 05[KNL] creating acquire job for policy 10.42.51.38/32[tcp/51910] === 10.42.142.205/32[tcp/24007] with reqid {1234}
Apr  5 11:24:01 05[CFG] trap not found, unable to acquire reqid 1234
Apr  5 11:24:07 14[IKE] <conn-10.2.13.25|3> retransmit 2 of request with message ID 0
Apr  5 11:24:07 14[NET] <conn-10.2.13.25|3> sending packet: from 172.17.0.2[500] to 10.2.13.25[500] (448 bytes)

I suspect your kernel does not support everything we need for IPsec. Can you check /proc/config.gz or /boot/config-$(uname -r) against https://wiki.strongswan.org/projects/strongswan/wiki/KernelModules

Executing this script:

#!/bin/sh
zcat /proc/config.gz | grep '\<CONFIG_XFRM_USER\>'
zcat /proc/config.gz | grep '\<CONFIG_NET_KEY\>'
zcat /proc/config.gz | grep '\<CONFIG_INET\>'
zcat /proc/config.gz | grep '\<CONFIG_IP_ADVANCED_ROUTER\>'
zcat /proc/config.gz | grep '\<CONFIG_IP_MULTIPLE_TABLES\>'
zcat /proc/config.gz | grep '\<CONFIG_INET_AH\>'
zcat /proc/config.gz | grep '\<CONFIG_INET_ESP\>'
zcat /proc/config.gz | grep '\<CONFIG_INET_IPCOMP\>'
zcat /proc/config.gz | grep '\<CONFIG_INET_XFRM_MODE_TRANSPORT\>'
zcat /proc/config.gz | grep '\<CONFIG_INET_XFRM_MODE_TUNNEL\>'
zcat /proc/config.gz | grep '\<CONFIG_INET_XFRM_MODE_BEET\>'
zcat /proc/config.gz | grep '\<CONFIG_IPV6\>'
zcat /proc/config.gz | grep '\<CONFIG_INET6_AH\>'
zcat /proc/config.gz | grep '\<CONFIG_INET6_ESP\>'
zcat /proc/config.gz | grep '\<CONFIG_INET6_IPCOMP\>'
zcat /proc/config.gz | grep '\<CONFIG_INET6_XFRM_MODE_TRANSPORT\>'
zcat /proc/config.gz | grep '\<CONFIG_INET6_XFRM_MODE_TUNNEL\>'
zcat /proc/config.gz | grep '\<CONFIG_INET6_XFRM_MODE_BEET\>'
zcat /proc/config.gz | grep '\<CONFIG_IPV6_MULTIPLE_TABLES\>'
zcat /proc/config.gz | grep '\<CONFIG_NETFILTER\>'
zcat /proc/config.gz | grep '\<CONFIG_NETFILTER_XTABLES\>'
zcat /proc/config.gz | grep '\<CONFIG_NETFILTER_XT_MATCH_POLICY\>'

I get in response:

CONFIG_XFRM_USER=m
CONFIG_NET_KEY=y
CONFIG_INET=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_MODE_TRANSPORT=y
CONFIG_INET_XFRM_MODE_TUNNEL=y
CONFIG_INET_XFRM_MODE_BEET=y
CONFIG_IPV6=y
CONFIG_INET6_AH=m
CONFIG_INET6_ESP=m
CONFIG_INET6_IPCOMP=m
CONFIG_INET6_XFRM_MODE_TRANSPORT=m
CONFIG_INET6_XFRM_MODE_TUNNEL=m
CONFIG_INET6_XFRM_MODE_BEET=m
# CONFIG_IPV6_MULTIPLE_TABLES is not set
CONFIG_NETFILTER=y
CONFIG_NETFILTER_XTABLES=y
CONFIG_NETFILTER_XT_MATCH_POLICY=m

Problem fixed with the last version of the Kernel from Scaleway. It could be useful to put somewhere what Rancher need in term of dependencies and linux modules.