Load Balancer poor throughput

meskyanichi · February 20, 2016, 12:50am

I have 2 containers behind a load balancer and I’m only achieving 2000 req/sec. When I benchmark the containers directly, bypassing the load balancer, I achieve 20_000 req/sec per container.

The containers and the load balancer are running on separate (dedicated) hardware on Packet.net, have the same specs, and are in the same datacenter.

Any idea what might be causing this? The overlay network perhaps?

denise · March 22, 2016, 9:36pm

We are starting to load test with our release candidates for GA. We’ve put in some performance fixes recently. Would you be able to test out our v1.0.0-rc1 to see if you’re still seeing the same performance issues?

alena · March 24, 2016, 12:22am

When I benchmark the containers directly, bypassing the load balancer, I achieve 20_000 req/sec per container.

Do you run your benchmark tests on the server outside of rancher managed network? If yes, would it be possible to try running the tests from the client running inside rancher network.

Any idea what might be causing this? The overlay network perhaps?

To figure out that, 3 sets of tests could be executed:

client runs outside of rancher managed network, against containers directly.
client runs inside rancher managed network, against container directly
client runs inside rancher managed network, against the LB having containers as targets.

if 1 and 2 are different,then the problem might be with an overlay network. If 1 and 2 are relatively the same, but 2 and 3 are drastically different the problem could be with the LB as well. It’s possible to tune up LB service haproxy for your application needs by modifying “defaults” and “global” sections parameters via UI (ref to related haproxy parameters: HAProxy version 1.6.6 - Configuration Manual)

meskyanichi · March 26, 2016, 7:23pm

I’m not noticing significant improvements in v1.0.0-rc3. However, it appears that a large chunk of the poor throughput occurs when using the HTTP/HTTPS protocol. When using TCP I’m seeing 25_000 rps, instead of 2000-4000 rps.

The following benchmarks are performed using wrk, and TCP rather than HTTP/HTTPS.

internet wrk        -> host network container = 65_000 rps

internet wrk        -> managed network container = 50_000 rps
managed network wrk -> managed network container = 30_000 rps

internet wrk        -> haproxy -> managed network container = 25_000 rps
managed network wrk -> haproxy -> managed network container = 30_000 rps

While 25_000 rps isn’t nearly what haproxy should be capable of handling, it is at least much better than 2000-4000 rps.

vincent · March 26, 2016, 8:03pm

You My guess would be the 25k to 2k drop is related in some way to keep-alive interaction from client <-> haproxy <-> target.

The rest is likely the additional hops through network agents, though some of the drops are surprising. How many balancer and target containers are there? Are they on the same host? We don’t currently have any way to prioritize targets on the same host so with more than one target or haproxy on a different host every request is

Client -> haproxy on host A -> network agent on A --encrypted-IPSec-tunnel–> network agent on host B -> target container on B

meskyanichi · March 26, 2016, 9:27pm

I was running two hosts in this case, one for haproxy and one for the container.

RVN_BR · March 27, 2016, 8:48pm

Is there any effort/issue that would give “local” containers priority in Rancher LB or some other way? Its quite interesting and useful imho…

esp when dealing with different backend services…

Topic		Replies	Views
Load Balancer performance Rancher 1.x	2	1332	December 10, 2016
Slow load-balancer on v0.43.1 Rancher 1.x	4	2235	November 9, 2015
Requests/s when load balancing issues Rancher 1.x	1	961	January 14, 2016
Internal Load Balancer weird behaviour Rancher 1.x	6	1592	February 5, 2016
Overlay network performance Rancher 1.x	3	4456	August 6, 2016

Load Balancer poor throughput

Related topics