we currently have the problem within our Kubernetes environment (RKE - onPremise) that we seem to have a very low throughput.
When we run our load tests (Gatling) against a Jira instance deployed in the cluster with two replicas, we get the following error message via Dynatrace:
- Packet retransmission rate for process nginx nginx-ingress-controller
(nginx-ingress-controller-m2m72) on host has increased
to 28 %
In the Nginx logs we get the following messages several times:
- 2022/03/07 13:59:11 [error] 32#32: *5251 upstream prematurely closed connection while sending to client, client: 10.42.18.0, server:
[Jira-URL], request: [Repsonse body]
- 2022/03/07 13:59:15 [error] 32#32: *5228 upstream prematurely closed connection while sending to client, client: 10.42.13.0, server: [Jira-URL], request: [Repsonse body]
- 2022/03/07 13:59:17 [error] 32#32: *4998 upstream prematurely closed connection while sending to client, client: 172.31.xx.xx, server: [Jira-URL], request: [Repsonse body]
What we also don’t understand in the above message is why the PodCIDR (10.42.18.0) is posted as the client and sometimes it is the IP of a worker node (172.31.xx.xx).
We have already set numerous parameters as well as systcl settings, but we don’t understand where the packet retransmission is coming from or how we can analyze the problem further.
Via the Configmap we pass the following parameters to the Nginx controller
disable-access-log: true worker-processes: "auto" max-worker-connections: 65536 upstream-keepalive-timeout: 315 keep-alive: 315 keep-alive-requests: "1000000" use-geoip: false
Über einen extraInitContainer passen wir zudem die folgenden systcl-Parameter an:
sysctl -w net.core.somaxconn=65536 sysctl -w net.ipv4.tcp_max_tw_buckets=1440000 sysctl -w net.ipv4.ip_local_port_range="1024 65535" sysctl -w fs.file-max=1048576
Does anyone have any ideas or even a potential solution to solve or narrow down the packet retransmission problem?