Okay guys, Im having a pretty substantial issue that if not resolved kind of takes Rancher off the table for me - services are not linking across servers. I am NOT currently using a firewall, but I am currently running AppArmor. Anyway, I first noticed the issue when I tried to link a memcached instance from one host to another - Rancher shows them as being linked but it throws errors when it tries to connect. I also noticed this when trying to connect a PhpMyAdmin instance to a MariaDB server that is on a different host. Both of these connections work properly when they are on the same server, but fail when they are on another host.
This is almost always firewalls or hosts registered with unreachable IPs… The network agents on the two hosts need to be able to open connections on 500/udp and 4500/udp from the container to the public IP of the host, as shown on the hosts in the UI infrastructure tab.
I checked the host iptables as well and all the CATTLE_PREROUTING entries are there and link to the public IP of the master server. Basically what happens is if I go to the URL that is defined in the load balancer for the shared service it hangs for about 10-15 seconds then issues a 503 error.
I’ve had this issue a lot when I was running on AWS with the latest amazon Linux ami (as at writing) which only supports docker 1.7. While often it was security groups blocking traffic initially I have since opened them up and still seen this occur. I never investigated software firewalls in this setup.
Since I have upgraded to the latest RHEL ami so I can try convoy which gives me docker 1.9 and in the last 24 hours noticed a lot more stability in this area - my elk stack always seems connected now whereas before some logspout containers would always fail to see the logstash instance.
Is rancher sensitive currently to the docker version?
There are some minor differences in how we have to do things depending on the version or specific features that require newer versions, but we support 1.6+ in general.