Too many layers of proxies dragging down performance

I’ve got a set of 4 RancherOS nodes in a Rancher cluster. Containers are mostly web apps, so I have HAProxy as built into Rancher to direct traffic to the right containers. In front of the HAProxy nodes, I have an Elastic Load Balancer, and in front of the ELB, I have CloudFlare (mostly for the free SSL) and in theory a CDN that properly caches assets would result in higher performance for most users. This just ins’t playing out in practice. Today I ran some tests with ApacheBench to see how each additional layer drags down performance, and it’s a bit excessive.

# from inside container on RancherOS server
ab -n 10000 -c 100 -H "Host: dreamscarred.com" http://localhost/license.txt

Finished 10000 requests

Server Software:        nginx/1.4.6
Server Hostname:        localhost
Server Port:            80

Document Path:          /license.txt
Document Length:        19935 bytes

Concurrency Level:      100
Time taken for tests:   3.837 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      201800000 bytes
HTML transferred:       199350000 bytes
Requests per second:    2605.94 [#/sec] (mean)
Time per request:       38.374 [ms] (mean)
Time per request:       0.384 [ms] (mean, across all concurrent requests)
Transfer rate:          51355.35 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   3.0      0      32
Processing:     1   37  13.7     36     104
Waiting:        1   35  13.5     34     104
Total:          1   38  13.6     38     104

Percentage of the requests served within a certain time (ms)
  50%     38
  66%     41
  75%     43
  80%     45
  90%     53
  95%     65
  98%     82
  99%     87
 100%    104 (longest request)

# going through haproxy from my laptop at home
ab -n 10000 -c 100 -H "Host: dreamscarred.com" https://wolfsbane.windsofstorm.net/license.txt

Finished 10000 requests


Server Software:        nginx/1.4.6
Server Hostname:        wolfsbane.windsofstorm.net
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128
TLS Server Name:        dreamscarred.com

Document Path:          /license.txt
Document Length:        19935 bytes

Concurrency Level:      100
Time taken for tests:   83.196 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      201800000 bytes
HTML transferred:       199350000 bytes
Requests per second:    120.20 [#/sec] (mean)
Time per request:       831.960 [ms] (mean)
Time per request:       8.320 [ms] (mean, across all concurrent requests)
Transfer rate:          2368.75 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      102  582 612.9    321   11530
Processing:    54  245 148.5    237    4315
Waiting:       30  126 116.7    108    4131
Total:        201  827 643.5    577   11925

Percentage of the requests served within a certain time (ms)
  50%    577
  66%    683
  75%    790
  80%   1575
  90%   1788
  95%   1938
  98%   2075
  99%   2926
 100%  11925 (longest request)

# going through ElasticLoadBalancer in front of HAProxy from my laptop at home
ab -n 10000 -c 100 -H "Host: dreamscarred.com" http://wos-elb-classic-1604117553.us-west-2.elb.amazonaws.com/license.txt

Finished 10000 requests


Server Software:        nginx/1.4.6
Server Hostname:        wos-elb-classic-1604117553.us-west-2.elb.amazonaws.com
Server Port:            80

Document Path:          /license.txt
Document Length:        19935 bytes

Concurrency Level:      100
Time taken for tests:   38.867 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      201800000 bytes
HTML transferred:       199350000 bytes
Requests per second:    257.29 [#/sec] (mean)
Time per request:       388.672 [ms] (mean)
Time per request:       3.887 [ms] (mean, across all concurrent requests)
Transfer rate:          5070.36 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       34  260 536.0     43    3916
Processing:    75  122  74.3     97    1047
Waiting:       38   77  73.3     52     999
Total:        116  382 541.8    142    4174

Percentage of the requests served within a certain time (ms)
  50%    142
  66%    149
  75%    297
  80%    337
  90%   1416
  95%   1433
  98%   1625
  99%   2687
 100%   4174 (longest request)

# going through CloudFlare from my laptop at home to simulate normal requests
ab -n 10000 -c 100 https://dreamscarred.com/license.txt

Finished 10000 requests


Server Software:        cloudflare
Server Hostname:        dreamscarred.com
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-ECDSA-CHACHA20-POLY1305,256,256
TLS Server Name:        dreamscarred.com

Document Path:          /license.txt
Document Length:        19935 bytes

Concurrency Level:      100
Time taken for tests:   138.384 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      204470000 bytes
HTML transferred:       199350000 bytes
Requests per second:    72.26 [#/sec] (mean)
Time per request:       1383.839 [ms] (mean)
Time per request:       13.838 [ms] (mean, across all concurrent requests)
Transfer rate:          1442.93 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      133  926 693.5    644    6029
Processing:    94  450 265.1    405    4988
Waiting:       55  195 143.4    165    4166
Total:        337 1375 788.5   1031    6272

Percentage of the requests served within a certain time (ms)
  50%   1031
  66%   1197
  75%   1820
  80%   2063
  90%   2557
  95%   2775
  98%   3637
  99%   3961
 100%   6272 (longest request)

This seems like the nginx containers are doing fine, it immediately tanks RPS as soon as HAProxy (from 2605.94 [#/sec] to 120.20 [#/sec]) enters into the picture, and each additional stack on top further reduces it.

I don’t really want to give up the CloudFlare with the bonuses they provide (especially security, given how frequently WordPress is a target), but it seems to me that the ELB and/or HAProxy setups could potentially be optimized. Right now I’m just doing one ELB as they cost about $20 a month last I looked, and I don’t want to pay more for ELBs as I do EC2 instances.

If I understood your testing environment right, the req/sec rate drops when you start testing from your laptop at home, which should be expected (network latency, buffer bloat, etc).

To get an accurate picture, I suggest you repeat all tests from a server located near the cluster (and with enough cores to handle the concurrency level you’re specifying).