Unknown failure in multihost loadtest

Hi Rancher Labs,

i made a few loadtest with this script and saw some behaviour in the infrastructure i cannot explain.

.
.
.
.

.
.
.


.
.
.

now i run this “loadtest script” on a 3rd VM.

i=“0”

while [ $i -lt 100000 ]
do
echo “versuch $1”
wget http://10.8.31.150/4K.txt
rm -f 4K.txt
sleep 0.01
i=$[$i+1]
done

After a couple of seconds (10-30) i just reveice a 503 with “no servers available to handle the request”.

You can also run this one:

ab -n 100000 -c 32 http://10.8.31.150:80/4K.txt
Then the 503 comes after 3-5 seconds…

At that point my lb is not able to ping the 10.42.244.70 (nginx) anymore.

And now i’m going to wait some minutes (5-10min) and my lb is able to ping 10.42.244.70 again…

If i run the test again, it is failing after some seconds again…

Could you please repeat that test in your Infrastructure? Maybe i got a bug in my infrastructure… Or it’s in the multihost-networking implementation… i hope it’s the first one :wink:

Regards,
Florian

Hey,

i just set up a 2nd environment on a completly different Infrastructure (two laptops).

the result ist the same.

After a couple of seconds the LB is repsonding a 503 (no servers available) and the lb isn’t able to ping the nginx container on the other laptop.

5-10minutes of waiting the ping is working again…

If i just test on 1 host it’s working fine… but of course i want to use rancher in a multi host szenario…

I’m using CentOs 7 as base-image for the hosts. Maybe it’s an issue about that… im going to test it again under ubuntu 14.04 server.

This sounds like you’re probably running out of ephemeral ports in the balancer container because it’s a large number of short requests coming from a single source…

Here’s a decent explanation: https://www.box.com/blog/ephemeral-port-exhaustion-and-web-services-at-scale/

I do not think the balancer container currently has any tuning for this, but it should…

Or apparently there’s a similar sort of issue with the IPSec routing table for the overlay network, which is probably more likely since you said a single host works fine. There’s a fix for that in the next release.

Hey vincent,

thank you for your fast response.

When is the release date?

Could you reconstruct my problem? Would be really nice, if that would be fixed. If you want to go to production in high-load applications it’s a must i guess…

Regards,
Florian

There is a fix with the ipsec in v0.42.0. Could you test with that version to see if it fixes your issue. It fixed another person’s issues that were similar.

Hi denise,

thank you for the update. It’s working now.
Thank you for the fix!

Regards,
Florian