we currently have the problem within our Kubernetes environment (RKE - onPremise) that we seem to have a very low throughput.
When we run our load tests (Gatling) against a Jira instance deployed in the cluster with two replicas, we get the following error message via Dynatrace:
- Packet retransmission rate for process nginx nginx-ingress-controller
(nginx-ingress-controller-m2m72) on host has increased
to 28 %
In the Nginx logs we get the following messages several times:
- 2022/03/07 13:59:11 [error] 32#32: *5251 upstream prematurely closed connection while sending to client, client: 10.42.18.0, server:
[Jira-URL], request: [Repsonse body] - 2022/03/07 13:59:15 [error] 32#32: *5228 upstream prematurely closed connection while sending to client, client: 10.42.13.0, server: [Jira-URL], request: [Repsonse body]
- 2022/03/07 13:59:17 [error] 32#32: *4998 upstream prematurely closed connection while sending to client, client: 172.31.xx.xx, server: [Jira-URL], request: [Repsonse body]
What we also don’t understand in the above message is why the PodCIDR (10.42.18.0) is posted as the client and sometimes it is the IP of a worker node (172.31.xx.xx).
We have already set numerous parameters as well as systcl settings, but we don’t understand where the packet retransmission is coming from or how we can analyze the problem further.
Via the Configmap we pass the following parameters to the Nginx controller
disable-access-log: true
worker-processes: "auto"
max-worker-connections: 65536
upstream-keepalive-timeout: 315
keep-alive: 315
keep-alive-requests: "1000000"
use-geoip: false
Über einen extraInitContainer passen wir zudem die folgenden systcl-Parameter an:
sysctl -w net.core.somaxconn=65536
sysctl -w net.ipv4.tcp_max_tw_buckets=1440000
sysctl -w net.ipv4.ip_local_port_range="1024 65535"
sysctl -w fs.file-max=1048576
Does anyone have any ideas or even a potential solution to solve or narrow down the packet retransmission problem?