Unknown failure in multihost loadtest

Hi Rancher Labs,

i made a few loadtest with this script and saw some behaviour in the infrastructure i cannot explain.




now i run this “loadtest script” on a 3rd VM.


while [ $i -lt 100000 ]
echo “versuch $1”
rm -f 4K.txt
sleep 0.01

After a couple of seconds (10-30) i just reveice a 503 with “no servers available to handle the request”.

You can also run this one:

ab -n 100000 -c 32
Then the 503 comes after 3-5 seconds…

At that point my lb is not able to ping the (nginx) anymore.

And now i’m going to wait some minutes (5-10min) and my lb is able to ping again…

If i run the test again, it is failing after some seconds again…

Could you please repeat that test in your Infrastructure? Maybe i got a bug in my infrastructure… Or it’s in the multihost-networking implementation… i hope it’s the first one :wink:



i just set up a 2nd environment on a completly different Infrastructure (two laptops).

the result ist the same.

After a couple of seconds the LB is repsonding a 503 (no servers available) and the lb isn’t able to ping the nginx container on the other laptop.

5-10minutes of waiting the ping is working again…

If i just test on 1 host it’s working fine… but of course i want to use rancher in a multi host szenario…

I’m using CentOs 7 as base-image for the hosts. Maybe it’s an issue about that… im going to test it again under ubuntu 14.04 server.

This sounds like you’re probably running out of ephemeral ports in the balancer container because it’s a large number of short requests coming from a single source…

Here’s a decent explanation: https://www.box.com/blog/ephemeral-port-exhaustion-and-web-services-at-scale/

I do not think the balancer container currently has any tuning for this, but it should…

Or apparently there’s a similar sort of issue with the IPSec routing table for the overlay network, which is probably more likely since you said a single host works fine. There’s a fix for that in the next release.

Hey vincent,

thank you for your fast response.

When is the release date?

Could you reconstruct my problem? Would be really nice, if that would be fixed. If you want to go to production in high-load applications it’s a must i guess…


There is a fix with the ipsec in v0.42.0. Could you test with that version to see if it fixes your issue. It fixed another person’s issues that were similar.

Hi denise,

thank you for the update. It’s working now.
Thank you for the fix!