Can't load balance across stacks

Sorry for the spam, trying to migrate to a Rancher deployment and hitting various pain points.

I have an HTTP load balancer specified as follows (some hosts changed):

HTTP:
  ports:
  - '443'
  restart: always
  external_links:
  - Perceptron/Blog:Blog
  - Perceptron/App:App
  labels:
    io.rancher.loadbalancer.target.Perceptron/Blog: perceptronapp.com/blog=9778
    io.rancher.loadbalancer.ssl.ports: '443'
    io.rancher.loadbalancer.target.Jenkins: ci.my.software=8080
    io.rancher.loadbalancer.target.Perceptron/App: perceptronapp.com=3000
  tty: true
  image: rancher/load-balancer-service
  links:
  - Jenkins:Jenkins
  stdin_open: true

The ci.my.software host works just fine. The perceptronapp.com ones don’t.

In particular, I’m trying to hit perceptronapp.com/blog. I always get a “503: No servers are available” error.

I have verified that the Blog service is up. I can curl the IP returned by “docker inspect”. If I hit a link that isn’t found, the Blog container throws a 404 in its logs. If I directly curl an invalid link, I see this log message. If I attempt to hit an invalid link through the load balancer, I see nothing in these logs.

Also, I notice a few seconds of delay between when I curl perceptronapp.com/blog and when I get the 503. There is no such delay when I hit a my.software host.

I’ve also removed App and the path from Blog such that it is reachable at the root. No change. In my default environment, I have a variety of services keyed off of hostnames but they’re all in the same stack.

Now I do notice that, despite the fact that Blog exposes 9778 and is reachable via its Docker IP, that it is not reachable on its overlay IP:9778. Other containers are. The port is definitely exposed and appears so in Rancher, but for some reason it doesn’t get mapped to the overlay network. Why would a port be reachable at its Docker IP but not at its overlay IP?

Am I doing something wrong here? What else can I try? I was successfully running these containers on a single host with a Dockerized web server, so I’m confused as to why they don’t work on the overlay network.

Thanks.

The balancer will be using the overlay IP to talk to the container, so if you can’t reach 10.42.x.y:9778 from another container then it won’t be able to either. That couple seconds is haproxy trying to reach any of the containers and eventually giving up and returning the 503 response itself.

So it sounds like that’s the real problem to focus on… How many hosts do you have? Does communication across containers on two different hosts (vs 2 containers that happen to be on the same host) work at all?

Apologies for the delay in responding, had a hell of a week and have only now gotten to this.

My setup is fairly simple–everything runs on a single large dedicated server, partitioned into VMs. The only service running on the server itself and not within a VM is Rancher and an Nginx proxy providing SSL termination.

There are 2 environments, each having a single VPS host. Each VPS has a single public IP assigned, and is running a load balancer. So no communication across hosts is currently occurring.

On the VPS where this is happening, I’m hosting several additional services on the same load balancer. They’re reachable and everything works just fine.

Would it help if I gave one or more of the Rancher developers access to my instance? I have 2 environments set up and a variety of containers running on each without problems. It is only this one stack that is causing these types of issues. So it seems less like user error and more like a subtle interaction gone bad, but I might be wrong. I just don’t know what specific factors to point to were I to file an issue. Failing that, I’d be happy to provide other details that might be helpful. This instance isn’t currently mission critical so sharing its data isn’t currently a problem.

Thanks.

FWIW, figured out the cause, it was the contained app’s fault. It was configured to only bind to the interface indicated by either ‘hostname’ in its configuration, or HOSTNAME in its environment, I hadn’t explicitly set a hostname in the config, and I knew for a fact that by default it bound to 0.0.0.0. But of course Docker sets $HOSTNAME, which told the app to only bind to its 172.17 IP since that’s what Docker’s host resolves to. Naturally this is fine for most Docker deployments and worked fine for years, but when it failed with Rancher I naturally suspected Rancher itself.

Explicitly setting the hostname to 0.0.0.0 seems to have resolved things. Apologies for the false alarm.