Services on different hosts are not linking

Okay guys, Im having a pretty substantial issue that if not resolved kind of takes Rancher off the table for me - services are not linking across servers. I am NOT currently using a firewall, but I am currently running AppArmor. Anyway, I first noticed the issue when I tried to link a memcached instance from one host to another - Rancher shows them as being linked but it throws errors when it tries to connect. I also noticed this when trying to connect a PhpMyAdmin instance to a MariaDB server that is on a different host. Both of these connections work properly when they are on the same server, but fail when they are on another host.

Here is the rancher-compose file:

wordpress:
  image: 192.168.201.110:5000/wrl/nginx-php-cache:latest
  expose:
    - "80"
  restart: always
  links: 
    - memcached:memcached
    - db:mysql
  volumes: 
    - /sites/prod/loveandrew.org/current/app:/var/www/app:rw
  labels:
    io.rancher.scheduler.affinity:host_label: host=dev
  entrypoint: /sbin/my_init
db:
  image: 192.168.201.110:5000/wrl/mariadb:latest
  restart: always
  expose: 
    - "3306"
  environment:
    MYSQL_ROOT_PASSWORD: ECRS96xDxqtgbdceugLU8BqF
  volumes: 
    - /data/db/loveandrew.org:/var/lib/mysql
  labels:
    io.rancher.scheduler.affinity:host_label: host=dev
phpmyadmin: 
  image: nazarpc/phpmyadmin
  restart: always
  links:
    - db:mysql
  expose:
    - "80"
  labels:
    io.rancher.scheduler.affinity:host_label: host=dev
memcached:
  image: memcached:latest
  restart: always
  mem_limit: 128000000
  expose: 
    - "11211"
  labels:
    io.rancher.scheduler.affinity:host_label: host=dev

Right now, the only other host label I have is for another server which is host=dbm. If I set it to run on dbm it just won’t connect.

Please, I hope one of you guys can help me fix this - if I can’t resolve this I’m going to have to abandon my plans of using Rancher :’(

This is almost always firewalls or hosts registered with unreachable IPs… The network agents on the two hosts need to be able to open connections on 500/udp and 4500/udp from the container to the public IP of the host, as shown on the hosts in the UI infrastructure tab.

http://docs.rancher.com/rancher/faqs/troubleshooting/#cross-host-communication

The host works and allows containers to be installed. This is the iptables rules for the slave:

Chain INPUT (policy ACCEPT)
target     prot opt source               destination
fail2ban-ssh  tcp  --  0.0.0.0/0            0.0.0.0/0            multiport dports 22

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain DOCKER (1 references)
target     prot opt source               destination
ACCEPT     udp  --  0.0.0.0/0            172.17.0.1           udp dpt:4500
ACCEPT     udp  --  0.0.0.0/0            172.17.0.1           udp dpt:500

Chain fail2ban-ssh (1 references)
target     prot opt source               destination
RETURN     all  --  0.0.0.0/0            0.0.0.0/0

These are the rules on the master:

Chain INPUT (policy ACCEPT)
target     prot opt source               destination
fail2ban-ssh  tcp  --  0.0.0.0/0            0.0.0.0/0            multiport dports 22
fail2ban-ssh  tcp  --  0.0.0.0/0            0.0.0.0/0            multiport dports 22

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain DOCKER (1 references)
target     prot opt source               destination
ACCEPT     tcp  --  0.0.0.0/0            172.17.0.13          tcp dpt:5000
ACCEPT     udp  --  0.0.0.0/0            172.17.0.29          udp dpt:4500
ACCEPT     udp  --  0.0.0.0/0            172.17.0.29          udp dpt:500
ACCEPT     tcp  --  0.0.0.0/0            172.17.0.2           tcp dpt:8080
ACCEPT     tcp  --  0.0.0.0/0            172.17.0.19          tcp dpt:5000
ACCEPT     udp  --  0.0.0.0/0            172.17.0.27          udp dpt:4500
ACCEPT     udp  --  0.0.0.0/0            172.17.0.27          udp dpt:500

Chain fail2ban-ssh (2 references)
target     prot opt source               destination
REJECT     all  --  158.85.184.239       0.0.0.0/0            reject-with icmp-port-unreachable
RETURN     all  --  0.0.0.0/0            0.0.0.0/0
RETURN     all  --  0.0.0.0/0            0.0.0.0/0

I checked the host iptables as well and all the CATTLE_PREROUTING entries are there and link to the public IP of the master server. Basically what happens is if I go to the URL that is defined in the load balancer for the shared service it hangs for about 10-15 seconds then issues a 503 error.

And just to confirm, are you able to ping from one container to a container on another host?

I’ve had this issue a lot when I was running on AWS with the latest amazon Linux ami (as at writing) which only supports docker 1.7. While often it was security groups blocking traffic initially I have since opened them up and still seen this occur. I never investigated software firewalls in this setup.

Since I have upgraded to the latest RHEL ami so I can try convoy which gives me docker 1.9 and in the last 24 hours noticed a lot more stability in this area - my elk stack always seems connected now whereas before some logspout containers would always fail to see the logstash instance.

Is rancher sensitive currently to the docker version?

There are some minor differences in how we have to do things depending on the version or specific features that require newer versions, but we support 1.6+ in general.