It’s possible for an application to effectively inhibit a system’s ability
to do something else even if it is not consuming all CPUs directly, at
least with non-RT systems, and I suspect the same is true in RT-line. If
the four cores dedicated to the OS are busy doing things that support the
application, for example, then that may be the case.
I’ve seen cases where a runaway process has taken all RAM and is now
trying to get the virtual memory from swap as well on a system with too
much swap, and even though the process is single-threaded (so it only gets
one core out of sixteen) the system is effectively useless until the
OUt-Of-Memory (OOM) killer takes over and nukes the lousy thing. Tying up
the hard drive is probably one of the easiest, minimal-effort things I can
do to lock up a system, particularly if I happen to be doing it while
using up the majority of system memory. For this reason, things like
ulimit exist to prevent using more resources than necessary, as do
cgroups, and perhaps this is what your application is using.
If you are SSH’d into the system, or if you go to the system console, can
you interact that way, meaning it is just new connections that fail?
Could you create a script to watch system resources (I/O, memory usage,
CPU utilization) while the script is running so you can gather statistics
even while disconnected or unable to interact with the system?
If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below…