systemd's connections to /run/systemd/private ?

At $JOB, our logs are getting swamped with messages saying:

“Too many concurrent connections, refusing”

It’s hampering our ability to manage services, e.g.:

[INDENT]# systemctl status ntpd
Failed to get properties: Connection reset by peer[/INDENT]

Near as I can tell from a quick read of the source of dbus.c, we’re hitting a hard-coded limit of CONNECTIONS_MAX (set to 4096).

I think this is related to the number of connections systemd (pid 1) has to /run/systemd/private:

[INDENT]# ss -x | grep /run/systemd/private | wc -l
4015[/INDENT]

But, despite the almost 4k connections, ‘ss’ shows that there are no connected peers:

[INDENT]# ss -x | grep /run/systemd/private | grep -v -e ‘* 0’ | wc -l
0[/INDENT]

  • Are there any tunables that would help us mitigate the “Too many concurrent connections, refusing” messages?

  • Is my guess about CONNECTIONS_MAX’s relationship to /run/systemd/private correct?

I wanted to ask these questions directly to the systemd people, but their mail server is not configured to allow confirmation emails as to sign up to their mailing list:

[INDENT]systemd-devel-request@lists.freedesktop.org:
131.252.210.177 does not like recipient.
Remote host said: 550 5.1.1 systemd-devel-request@lists.freedesktop.org: Recip
ient address rejected: User unknown in local recipient table
Giving up on 131.252.210.177.[/INDENT]

[QUOTE=SCFg4gyODe;58045]At $JOB, our logs are getting swamped with messages saying:

“Too many concurrent connections, refusing”

It’s hampering our ability to manage services, e.g.:

[INDENT]# systemctl status ntpd
Failed to get properties: Connection reset by peer[/INDENT]

Near as I can tell from a quick read of the source of dbus.c, we’re hitting a hard-coded limit of CONNECTIONS_MAX (set to 4096).

I think this is related to the number of connections systemd (pid 1) has to /run/systemd/private:

[INDENT]# ss -x | grep /run/systemd/private | wc -l
4015[/INDENT]

But, despite the almost 4k connections, ‘ss’ shows that there are no connected peers:

[INDENT]# ss -x | grep /run/systemd/private | grep -v -e ‘* 0’ | wc -l
0[/INDENT]

  • Are there any tunables that would help us mitigate the “Too many concurrent connections, refusing” messages?

  • Is my guess about CONNECTIONS_MAX’s relationship to /run/systemd/private correct?

I wanted to ask these questions directly to the systemd people, but their mail server is not configured to allow confirmation emails as to sign up to their mailing list:

[INDENT]systemd-devel-request@lists.freedesktop.org:
131.252.210.177 does not like recipient.
Remote host said: 550 5.1.1 systemd-devel-request@lists.freedesktop.org: Recip
ient address rejected: User unknown in local recipient table
Giving up on 131.252.210.177.[/INDENT][/QUOTE]
Hi and welcome to the Forum :slight_smile:
Sounds like an issue with ntpd?

systemctl status ntpd

● ntpd.service - NTP Server Daemon
   Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
  Drop-In: /run/systemd/generator/ntpd.service.d
           └─50-insserv.conf-$time.conf
   Active: active (running) since Sat 2019-06-29 16:28:56 CDT; 1 day 21h ago
     Docs: man:ntpd(1)
  Process: 19596 ExecStart=/usr/sbin/start-ntpd start (code=exited, status=0/SUCCESS)
 Main PID: 19605 (ntpd)
    Tasks: 2 (limit: 512)
   CGroup: /system.slice/ntpd.service
           ├─19605 /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -g -u ntp:ntp -c /etc/ntp.conf
           └─19606 ntpd: asynchronous dns resolver

Jun 29 16:28:56 gekkota-suma systemd[1]: Starting NTP Server Daemon...
Jun 29 16:28:56 gekkota-suma ntpd[19604]: ntpd 4.2.8p13@1.3847-o Wed Mar 13 12:24:30 UTC 2019 (1): Starting
Jun 29 16:28:56 gekkota-suma ntpd[19604]: Command line: /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -g -u ntp:ntp -c /etc/ntp.conf
Jun 29 16:28:56 gekkota-suma ntpd[19605]: proto: precision = 0.053 usec (-24)
Jun 29 16:28:56 gekkota-suma ntpd[19605]: basedate set to 2019-03-01
Jun 29 16:28:56 gekkota-suma ntpd[19605]: gps base set to 2019-03-03 (week 2043)
Jun 29 16:28:56 gekkota-suma ntpd[19605]: switching logging to file /var/log/ntp
Jun 29 16:28:56 gekkota-suma start-ntpd[19596]: Starting network time protocol daemon (NTPD)
Jun 29 16:28:56 gekkota-suma systemd[1]: Started NTP Server Daemon.

ntpq

ntpq> peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*time-a-g.nist.g .NIST.           1 u  584 1024  205  168.297   -0.847   1.932
+time-b-g.nist.g .NIST.           1 u  289 1024    7  165.909   -0.151   0.442

gekkota-suma:~ # ss -x | grep /run/systemd/private | wc -l
0

systemctl status dbus
● dbus.service - D-Bus System Message Bus
   Loaded: loaded (/usr/lib/systemd/system/dbus.service; static; vendor preset: disabled)
   Active: active (running) since Sat 2019-06-29 15:16:13 CDT; 1 day 22h ago
     Docs: man:dbus-daemon(1)
 Main PID: 838 (dbus-daemon)
    Tasks: 1 (limit: 512)
   CGroup: /system.slice/dbus.service
           └─838 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation

Jun 29 15:16:13 gekkota-suma systemd[1]: Started D-Bus System Message Bus.
Jun 29 15:16:15 gekkota-suma dbus[838]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service'
Jun 29 15:16:15 gekkota-suma dbus[838]: [system] Successfully activated service 'org.freedesktop.hostname1'

Nope, I can independently confirm the ntpd process is running and functioning.

The symptom here is that depending on system activity, systemd stops being able to process new requests. systemd allows requests to come in (e.g. via an invocation of ‘systemctl’, but if I understand the source of dbus.c, when there are too many connections to it’s outgoing stream, systemd rejects the efforts, apparently with no retry.

At $JOB, when we first spin up a new SLES12 host with our custom services, the number of connections to /run/systemd/private numbers in the mere hundreds. As workloads increase, the number of connections raises to the thousands.

Some hosts are plagued with the ‘Too many concurrent’ connections, some are not. Empirically, all I’ve been able to see is that the number of systemd’s connections to /run/systemd/private tips over 4k.

  • I can’t demonstrate that there are any consumers of this stream.
  • I can’t explain why the connection count increases over time.
  • The CONNECTION_MAX constant is hard-coded, and it gets increased every few months/years, but never seems to be expressed as something you can set in a config file.
  • I don’t know what tunables affect the lifetime/culling of those connections.

I have a hypothesis that this may be some resource leak in systemd, but I’ve not found a way to test that.

[QUOTE=SCFg4gyODe;58047]Nope, I can independently confirm the ntpd process is running and functioning.

The symptom here is that depending on system activity, systemd stops being able to process new requests. systemd allows requests to come in (e.g. via an invocation of ‘systemctl’, but if I understand the source of dbus.c, when there are too many connections to it’s outgoing stream, systemd rejects the efforts, apparently with no retry.

At $JOB, when we first spin up a new SLES12 host with our custom services, the number of connections to /run/systemd/private numbers in the mere hundreds. As workloads increase, the number of connections raises to the thousands.

Some hosts are plagued with the ‘Too many concurrent’ connections, some are not. Empirically, all I’ve been able to see is that the number of systemd’s connections to /run/systemd/private tips over 4k.

  • I can’t demonstrate that there are any consumers of this stream.
  • I can’t explain why the connection count increases over time.
  • The CONNECTION_MAX constant is hard-coded, and it gets increased every few months/years, but never seems to be expressed as something you can set in a config file.
  • I don’t know what tunables affect the lifetime/culling of those connections.

I have a hypothesis that this may be some resource leak in systemd, but I’ve not found a way to test that.[/QUOTE]
Hi
So, my SLES 12 Sp3 just runs SuMA, but I see no private directory. I’m assuming this is a systemd service your starting for your service? Almost sounds like the process keeps respawning and maybe needs a killmode added in the service to clean them up.

Belatedly, I want to point out that i found a bug in systemd, more specifically the version of systemd packaged with SLES12.
All the details have been captured in this email thread.

https://www.spinics.net/lists/systemd-devel/msg02784.html