I’m not even sure I know how to explain this, but I’ll give it a shot.
I’m using Rancher in AWS. I have an AWS ALB as my only exposed-to-the-internet component (spoiler alert! it’s this desire not to have the Rancher LB exposed to the internet that eventually led to the issues I’ll describe later). That ALB feeds requests to a Rancher LB (i.e. HAproxy). The Rancher LB has several selector rules: “standard” on port 80, “alt1” on port 91, and “alt2” on port 92.
SSL terminates at the AWS ALB and all requests are expected to be HTTPS so we have listeners on 443, 444, and 445 (443 being standard for HTTPS, of course). Each of these feed back to the Rancher LB on 80, 91, and 92 respectively.
Finally, we have applications in our stacks with selectors in the service definitions. For example:
docker-compose.yml
version: '2'
services:
serviceA:
image: someimageA
labels:
lbroute: standard
serviceB:
image: someimageB
labels:
lbroute: alt1
serviceC:
image: someimageC
labels:
lbroute: alt2
rancher-compose.yml
version: '2'
services:
serviceA:
lb_config:
port_rules:
- target_port: 9200
hostname: dev1.example.com
serviceB:
lb_config:
port_rules:
- target_port: 8080
hostname: dev1.example.com
serviceC:
lb_config:
port_rules:
- target_port: 5601
hostname: dev1.example.com
I really thought this would work. If a request comes into the Rancher LB on port 91, it would route to serviceB, right? Nope… not being much an an HAProxy expert, I banged my head against a wall for two days before I finally figured this out and wanted to share the results as a cautionary tale.
In the above situation, Rancher creates some HAprocy rules that look like
bind *.80
acl 80_dev1_example_com__host hdr(host) -i dev1iexample.com
acl 80_dev1_example_com__host hdr(host) -i dev1.example.com:80
use_backend 80_dev1_example_com_ if 80_dev1_example_com__host
Again, not an HAProxy expert here so it took me a bit to figure that out, but it’s basically saying that for a request that comes in on port 80, if the “host” header matches “dev1.example.com” or “dev1.example.com:80” then send the request to the appropriate server/container.
This all works great for standard ports (80/443) because your host header will come in with no port in the string so it will clearly match the first rule. Things break down in ways that were, for me, very difficult to debug when there actually IS a port in the host header. Take for example this rule:
bind *.91
acl 91_dev1_example_com__host hdr(host) -i dev1iexample.com
acl 91_dev1_example_com__host hdr(host) -i dev1.example.com:91
use_backend 91_dev1_example_com_ if 91_dev1_example_com__host
Recall that I have an AWS ALB as my gateway and it’s listening on port 444 for this particular request and then passing it back on port 91 to be handled by the Rancher LB. Well, the host header looks like this: dev1.example.com:444. I didn’t know this of course, as I thought the rancher LB was just checking the incoming port and basing it’s routing on that alone, or if it was using the host to check, I didn’t consider that the port would be included in the host name comparison. As you can see “dev1.example.com:444” will not match either of the host name checks and therefore will silently be routed to the “default” backend for this frontend, which in this case is unspecified. An unspecified default just looks for a backend with the same name as the frontend, but that doesn’t exist in this case so you end up with a 503 message that there is no backend server available.
As I’m describing this, I’m hoping it makes sense, but trying to figure it out was a bear and a half, to say the least. I bolded my key incorrect assumption above that led to the confusion, however, I’m not sure that this assumption was unfounded given the documentation and way the routing is described in the yml files.
This could have been avoided had I used the same ports in the Rancher LB as I used in the AWS ALB (in fact I may go back and do just that).
After figuring all of this out, my workaround was to make the hostname rules in my rancher-compose file use wildcards like:
version: '2'
services:
serviceA:
lb_config:
port_rules:
- target_port: 9200
hostname: dev1.example.com*
serviceB:
lb_config:
port_rules:
- target_port: 8080
hostname: dev1.example.com*
serviceC:
lb_config:
port_rules:
- target_port: 5601
hostname: dev1.example.com*
So, I’m kind of torn. I certainly wouldn’t call this a “bug” but I do think given the setup that it’s not unreasonable for someone to have assumed that the configuration I initially had in place would work.