Overlay network performance

Hi, I’m testing rancher (cattle) and I’m noticing slow network performances.
I’ve created a rancher environment with 2 agent nodes (public1, public2),
added a container stack with a container in each node (agent1, agent2)

these are the measurements I get with iperf(iperf -i 1 -t 3):

  • from public2 to container1

[ 3] 0.0- 1.0 sec 113 MBytes 948 Mbits/sec
[ 3] 1.0- 2.0 sec 113 MBytes 947 Mbits/sec
[ 3] 2.0- 3.0 sec 112 MBytes 940 Mbits/sec
[ 3] 0.0- 3.0 sec 338 MBytes 943 Mbits/sec

  • fom agent2 to public1

[ 3] 0.0- 1.0 sec 113 MBytes 950 Mbits/sec
[ 3] 1.0- 2.0 sec 112 MBytes 935 Mbits/sec
[ 3] 2.0- 3.0 sec 111 MBytes 934 Mbits/sec
[ 3] 0.0- 3.0 sec 336 MBytes 940 Mbits/sec

  • from public1 to agent2

[ 3] 0.0- 1.0 sec 12.8 MBytes 107 Mbits/sec
[ 3] 1.0- 2.0 sec 16.5 MBytes 138 Mbits/sec
[ 3] 2.0- 3.0 sec 9.88 MBytes 82.8 Mbits/sec
[ 3] 0.0- 3.0 sec 39.2 MBytes 109 Mbits/sec

  • from public2 to agent2

[ 3] 0.0- 1.0 sec 828 MBytes 6.94 Gbits/sec
[ 3] 1.0- 2.0 sec 844 MBytes 7.08 Gbits/sec
[ 3] 2.0- 3.0 sec 859 MBytes 7.20 Gbits/sec
[ 3] 0.0- 3.0 sec 2.47 GBytes 7.08 Gbits/sec

  • from agent1 to agent2

[ 3] 0.0- 1.0 sec 14.4 MBytes 121 Mbits/sec
[ 3] 1.0- 2.0 sec 14.0 MBytes 117 Mbits/sec
[ 3] 2.0- 3.0 sec 11.5 MBytes 96.5 Mbits/sec
[ 3] 0.0- 3.0 sec 40.0 MBytes 111 Mbits/sec

AES-NI is enabled in both public[12], openssl (openssl speed -evp aes-128-cbc) on the slower public2 (C2350) returns:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc     152309.93k   235208.51k   281690.41k   297612.97k   302000.81k

while on public1 (C2750):

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc     214204.88k   326156.29k   388896.85k   408870.57k   414878.38k

Based on the openssl speed results I think the connection between containers in separate hosts should have a bandwidth of ~ 200Mbps, but is seems to be stuck at a little more than 100Mbps.

I’ve seen a noticeable ksoftirqd CPU usage (~ 15%/30%) while an instance of iperf -s in a container is receiving data from the other host or from the other container.
Running the iperf -s command directly on the host does not generate noticeable CPU usage from ksoftirqd while receiving data from either the other host or the other container.

I think its important to know:

  • HostOS and Kernel
  • may the docker version
  • baseline performance and network architecture

I tried centos7.2 (default kernel) and ubuntu 14.04 (default kernel).
The performances were aligned

docker 1.10.3 in both cases

The hosts are dedicated online.net servers. As included in the previous post (not really clearly, sorry) baseline performances on bare metal were around 1Gbps.
More precisely: when the iperf target was bare metal the bandwidth was ~1Gbps (even from a container with overlay network), while a rancher overlay as iperf target reached only ~100Mbps.

I noticed on the rancher/agent-instance that ipsec was configured for using aes-128-cbc (swanctl --list-sas returned AES_CBC-128/AES_XCBC_96/PRF_AES128_XCBC/MODP_2048), so I benched my cpus with openssl speed. The slowest cpu i used was able to encrypt ~300mbps for >1024byte blocks.

Quoting a study:

From our testing we can conclude that Docker bridge networking and even Rancher’s overlay networks do not cause a significant overhead for long running TCP connections. This is relevant to Databases and Caches as they often use long-running connections between their clients and servers with a large throughout requirements. From our testing this workload should not pose problems for Docker or Rancher.

Source:
http://techtraits.com/dockerperf.html