Load Balancer poor throughput

I have 2 containers behind a load balancer and I’m only achieving 2000 req/sec. When I benchmark the containers directly, bypassing the load balancer, I achieve 20_000 req/sec per container.

The containers and the load balancer are running on separate (dedicated) hardware on Packet.net, have the same specs, and are in the same datacenter.

Any idea what might be causing this? The overlay network perhaps?

We are starting to load test with our release candidates for GA. We’ve put in some performance fixes recently. Would you be able to test out our v1.0.0-rc1 to see if you’re still seeing the same performance issues?

When I benchmark the containers directly, bypassing the load balancer, I achieve 20_000 req/sec per container.

Do you run your benchmark tests on the server outside of rancher managed network? If yes, would it be possible to try running the tests from the client running inside rancher network.

Any idea what might be causing this? The overlay network perhaps?

To figure out that, 3 sets of tests could be executed:

  1. client runs outside of rancher managed network, against containers directly.
  2. client runs inside rancher managed network, against container directly
  3. client runs inside rancher managed network, against the LB having containers as targets.

if 1 and 2 are different,then the problem might be with an overlay network. If 1 and 2 are relatively the same, but 2 and 3 are drastically different the problem could be with the LB as well. It’s possible to tune up LB service haproxy for your application needs by modifying “defaults” and “global” sections parameters via UI (ref to related haproxy parameters: https://cbonte.github.io/haproxy-dconv/configuration-1.6.html#3.2)

I’m not noticing significant improvements in v1.0.0-rc3. However, it appears that a large chunk of the poor throughput occurs when using the HTTP/HTTPS protocol. When using TCP I’m seeing 25_000 rps, instead of 2000-4000 rps.

The following benchmarks are performed using wrk, and TCP rather than HTTP/HTTPS.

internet wrk        -> host network container = 65_000 rps

internet wrk        -> managed network container = 50_000 rps
managed network wrk -> managed network container = 30_000 rps

internet wrk        -> haproxy -> managed network container = 25_000 rps
managed network wrk -> haproxy -> managed network container = 30_000 rps

While 25_000 rps isn’t nearly what haproxy should be capable of handling, it is at least much better than 2000-4000 rps.

You My guess would be the 25k to 2k drop is related in some way to keep-alive interaction from client <-> haproxy <-> target.

The rest is likely the additional hops through network agents, though some of the drops are surprising. How many balancer and target containers are there? Are they on the same host? We don’t currently have any way to prioritize targets on the same host so with more than one target or haproxy on a different host every request is

Client -> haproxy on host A -> network agent on A --encrypted-IPSec-tunnel–> network agent on host B -> target container on B

I was running two hosts in this case, one for haproxy and one for the container.

Is there any effort/issue that would give “local” containers priority in Rancher LB or some other way? Its quite interesting and useful imho…

esp when dealing with different backend services…