Connection refused when accessing mapped ports on hosts

I don’t seem to be able to access containers via the host’s IP although the ports are mapped.

I want to use AWS AutoScaling groups to spin up servers. The AS has an ELB attached to it, with health checks.
When AS spins up a new server, it installs docker and runs the rancher/agent to register with rancher and set up some labels. This works great.

Rancher then installs some service on the hosts based on the labels, works great. In this particular case, I’m running consul. The docker-compose.yml properly maps ports to the host and looks like:

consul:
  image: progrium/consul
  ports:
  - 8400:8400
  - 8500:8500
  - 8300:8300
  - 8301:8301
  - 8301:8301/udp
  - 8302:8302
  - 8302:8302/udp
  - 53:53/udp
  command: -server -bootstrap -ui-dir /ui
  labels:
    io.rancher.scheduler.global: 'true'
    io.rancher.scheduler.affinity:host_label: consul=true

Now I’m testing the connection to consul, using a simple curl call:

curl http://<IP>:8500/v1/catalog/nodes

If I ssh to the host running the consul container and curl the IP of the container (assigned by rancher in the 10.42 range) then this works fine. From the same host, if I try to curl the IP of the host itself, I get:

curl: (7) Failed to connect to <IP> port 8500: Connection refused

The hosts are in a private subnet, and the attached security groups allow all incoming traffic between hosts themselves, all incoming traffic from the bastion server (which runs rancher server), and all incoming traffic from an ELB security group. I’ve also tried spinning up a new host directly from rancher ui, and I get the exact same behavior.

So the ELB/AS will eventually fail the health checks and teardown the host and spin a new one up and start again.

When I inspect the consul container in rancher ui, all the ports are mapped and active.

UFW is disabled and here’s iptables from the host:

root@ip-10-0-2-88:~# iptables -L -n
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain DOCKER (1 references)
target     prot opt source               destination
ACCEPT     udp  --  0.0.0.0/0            172.17.0.4           udp dpt:4500
ACCEPT     udp  --  0.0.0.0/0            172.17.0.4           udp dpt:500

I found a topic that describes a similar problem to mine but the solution there didn’t help me as ufw was already disabled
http://forums.rancher.com/t/cannot-access-service-remotely-via-mapped-ports/211

I’m using:

Rancher	v0.31.0
Cattle	v0.73.0
User Interface	v0.39.0
Rancher Compose	beta/latest

Consul might not be the simplest thing to get up and running, I also tried with redis with the following docker-compose.yml file:

redis:
  image: redis:latest
  ports:
  - 6379:6379

Same problem, from the host I can use the redis-cli tool to connect directly to the container’s IP, but unable to connect through the host IP

I feel I’m missing something really stupid :frowning:

I’m not too sure of your setup, but does your security group have all the ports open so that it can be accessed using the host IP?

@yogin
Could you share the docker inspect for one of the containers that you can reach via the 10.42 address but via its host IP?
Perhaps the redis one, since its setup is simpler.

@yogin Could you also share the output of iptables -L -n -t nat?

I am not sure if this is the same situation, but I am finding it impossible to map ports explicitly to 0.0.0.0. It seems rancher might be trying to be smarter than I need it to be and binding it to ephemeral docker network IPs that don’t route externally instead of the host itself

@Chas Are you using bridged network because we do have a bug for bridge networking.

https://github.com/rancher/rancher/issues/1365

Actually no, I experienced this on the managed network. Trying the "host’ network didn’t work at all. I haven’t touched the bridge network yet, kind of fumbling around because our apps usually just sat on 0.0.0.0 and we largelybignored the docker0 network

Ultimately I need to find a way to bind to 0.0.0.0 or I can’t move forward, due to the nature of the docker containers I am using :frowning:

@Chas it sounds more like you’re trying this issue where we can’t bind to a specified IP. We’ll definitely be fixing this before GA.

Yeah, or just all IPs. Some weird routing expected by my orgs apps heh.
For the most part I seem fine using the managed network without much further input, but I do have a few that need bound explicitly to the host and others that require 0.0.0.0

Just finished the upgrade to the latest rancher server with the networking fix.
This is exceptionally odd, from command line I receive the same symptoms after switching back to managed:

ccorbett@test-devnet-t04-e02:~$ docker ps -a | grep 8983
9fd390273531 registry.redacted.net/solr.server:latest “/app/startup.sh” 38 seconds ago Up 37 seconds 8983/tcp 3b884fb9-da6a-4b57-8ed8-50e46d70fd41
ccorbett@test-devnet-t04-e02:~$ sudo iptables -L -t nat | grep 8983
DNAT tcp – anywhere anywhere ADDRTYPE match dst-type LOCAL tcp dpt:8983 to:10.42.21.212:8983
ccorbett@test-devnet-t04-e02:~$ curl localhost:8983
curl: (7) Failed to connect to localhost port 8983: Connection refused

But oddly, the connections made from inside the docker containers seem to work perfectly fine now (IE the application can talk to solr, and the nat’s from public can talk to the http endpoint)

Just re-ran the upgrade to confirm. For some reason the application is doing ok, but we still do a lot of testing by accessing the application locally via command line. Within rancher:

It looks like rancher thinks it is binding to a public IP that exists 2 systems “above” this one in the stack (everything is nat’ed down, which is why we use 0.0.0.0 binds).

Just curious as I don’t really know what is going on in the managed network.

It looks like you might be hitting this bug:

I am kind of revisiting this as of 50.1 I still cannot curl or access a service from the same host rancher deploys it on.

Eg.

6f6cf132a816 registry.internal/devnet.combo:latest "/scedev/install/doc 20 hours ago Up 20 hours 80/tcp, 8080/tcp

ccorbett@host:/var/tmp$ sudo iptables -L -t nat | grep http DNAT tcp -- anywhere anywhere ADDRTYPE match dst-type LOCAL tcp dpt:http to:10.42.158.139:8080

If I go up one level and curl, I can see the service. But from localhost:

ccorbett@host:/var/tmp$ curl localhost curl: (7) Failed to connect to localhost port 80: Connection refused