Unknown failure in multihost loadtest

Flo_B · October 14, 2015, 9:24am

Hi Rancher Labs,

i made a few loadtest with this script and saw some behaviour in the infrastructure i cannot explain.

.
.
.
.

.
.
.

now i run this “loadtest script” on a 3rd VM.

i=“0”

while [ $i -lt 100000 ]
do
echo “versuch $1”
wget http://10.8.31.150/4K.txt
rm -f 4K.txt
sleep 0.01
i=$[$i+1]
done

After a couple of seconds (10-30) i just reveice a 503 with “no servers available to handle the request”.

You can also run this one:

ab -n 100000 -c 32 http://10.8.31.150:80/4K.txt
Then the 503 comes after 3-5 seconds…

At that point my lb is not able to ping the 10.42.244.70 (nginx) anymore.

And now i’m going to wait some minutes (5-10min) and my lb is able to ping 10.42.244.70 again…

If i run the test again, it is failing after some seconds again…

Could you please repeat that test in your Infrastructure? Maybe i got a bug in my infrastructure… Or it’s in the multihost-networking implementation… i hope it’s the first one

Regards,
Florian

Flo_B · October 14, 2015, 11:35am

Hey,

i just set up a 2nd environment on a completly different Infrastructure (two laptops).

the result ist the same.

After a couple of seconds the LB is repsonding a 503 (no servers available) and the lb isn’t able to ping the nginx container on the other laptop.

5-10minutes of waiting the ping is working again…

If i just test on 1 host it’s working fine… but of course i want to use rancher in a multi host szenario…

I’m using CentOs 7 as base-image for the hosts. Maybe it’s an issue about that… im going to test it again under ubuntu 14.04 server.

vincent · October 14, 2015, 4:51pm

This sounds like you’re probably running out of ephemeral ports in the balancer container because it’s a large number of short requests coming from a single source…

Here’s a decent explanation: https://www.box.com/blog/ephemeral-port-exhaustion-and-web-services-at-scale/

I do not think the balancer container currently has any tuning for this, but it should…

vincent · October 14, 2015, 4:54pm

Or apparently there’s a similar sort of issue with the IPSec routing table for the overlay network, which is probably more likely since you said a single host works fine. There’s a fix for that in the next release.

Flo_B · October 15, 2015, 7:08am

Hey vincent,

thank you for your fast response.

When is the release date?

Could you reconstruct my problem? Would be really nice, if that would be fixed. If you want to go to production in high-load applications it’s a must i guess…

Regards,
Florian

denise · October 22, 2015, 8:12pm

There is a fix with the ipsec in v0.42.0. Could you test with that version to see if it fixes your issue. It fixed another person’s issues that were similar.

Flo_B · November 2, 2015, 10:13am

Hi denise,

thank you for the update. It’s working now.
Thank you for the fix!

Regards,
Florian

Topic		Replies	Views
HAProxy crash/restart issue Rancher v0.51.0 Rancher 1.x	1	1409	February 8, 2016
Rancher v1.0.0 cross host networking fails Rancher 1.x	17	3941	April 21, 2016
Slow load-balancer on v0.43.1 Rancher 1.x	4	2212	November 9, 2015
Rancher HA Stack "Degraded" Rancher 1.x	27	5809	May 2, 2016
LoadBalancers fail after a day or so Rancher 1.x	4	923	April 4, 2017

Unknown failure in multihost loadtest

Related Topics