High load but no cpu activity

Hi,

i have a two-node HA cluster, where one node has a high load (~ 30).
But no high cpu activity, also no waiting for IO.

[CODE]top - 21:52:28 up 3 days, 4:51, 18 users, load average: 29.58, 29.34, 29.37
Tasks: 384 total, 1 running, 383 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.7 us, 1.8 sy, 0.0 ni, 97.5 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem: 13189262+total, 13101358+used, 879040 free, 585960 buffers
KiB Swap: 13107200+total, 644608 used, 13042739+free. 12695278+cached Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7093 root 0 -20 27804 10868 3520 S 1.428 0.008 254:55.21 /usr/bin/atop -a -R -w /var/log/atop/atop_20190309 1
729 qemu 20 0 20.807g 1.067g 22876 S 0.455 0.848 184:58.00 /usr/bin/qemu-system-x86_64 -machine accel=kvm -name guest=mausdb,debug-threads=on -S -object secret,id=masterKey0,for+
774 qemu 20 0 7081016 1.001g 22528 S 0.041 0.796 12:04.02 /usr/bin/qemu-system-x86_64 -machine accel=kvm -name guest=sim,debug-threads=on -S -object secret,id=masterKey0,format+
1010 root 20 0 14348 2736 1984 S 0.041 0.002 24:10.42 top
18606 root 20 0 14364 2684 1956 R 0.041 0.002 0:01.50 top
30428 root 20 0 14364 2688 1936 S 0.041 0.002 24:35.08 top
[/CODE]

I don’t have any idea from where the high load comes. There is one script running much longer than expected.
But it’s doing nothing than waiting:

ha-idg-1:~ # strace -p 31757 Process 31757 attached wait4(-1,
man 2 wait says the process waits for a child.
How can i find out for which child it is waiting or what the child is doing.

The script writes its output to a file lying on a ocfs2 partition. But i can’t have a look into this folder.
I can enter that directory, but a ls -la seems to be stuck.
That happens already the second time, so i think it’s not by accident.

I don’t have any clue what’s going on, so every help is welcome.

Bernd

My first guess anytime there is high load is always to check disks. Load
average, as you may know, is the number of processes executing on the CPU
or in a waiting state, so kind of the line of things needing to be
processed regardless of how much processing each one has to do. As a
result, this number is a bit relative as if you have sixty-four (64)
logical CPUs then having this kind of load average is not as terrible as
if you are on a single CPU VM. If a process waits on a disk, then the
utilization may be low but anything needing access to that disk will
probably be behind the first process, and they may all be waiting, so that
may grow, and it’s a sign of a disk needing to perform better.

Where your ‘ls’ command never returns, this may be a good indicator that
that particular disk is busy. Where it is OCFS2 it may not even be local,
but could be on a SAN or something, so maybe the problem is even over
there. Your output also shows two KVM-based virtual machines (VM) and
while I presume they are not using OCFS2 for their disks, I cannot tell
from here.


Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below.

If you want to send me a private message, please let me know in the
forum as I do not use the web interface often.

[QUOTE=ab;57131]Your output also shows two KVM-based virtual machines (VM) and
while I presume they are not using OCFS2 for their disks, I cannot tell
from here.
[/QUOTE]
Why do you presume they are not using OCFS2 ? Indeed they are.
Is it a problem ?

Bernd

On 03/29/2019 05:54 AM, berndgsflinux wrote:[color=blue]

ab;57131 Wrote:[color=green]

Your output also shows two KVM-based virtual machines (VM) and
while I presume they are not using OCFS2 for their disks, I cannot tell
from here.
[/color]
Why do you presume they are not using OCFS2 ? Indeed they are.
Is it a problem ?[/color]

I think what I meant is that I presume the VM’s primary disk files (e.g. a
…qcow2, or .vmdk or equivalent) are not hosted on something like OCFS2. I
do not think that would be a great idea, as opposed to having the VM’s
main disk on something more traditional (e.g. XFS) and then having an
OCFS2 volume mounted within the VM for Oracle cluster purposes so the VM
can then access the shared disk like any physical system would.


Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below.

If you want to send me a private message, please let me know in the
forum as I do not use the web interface often.

I have the VM’s disk files (raw) on an OCFS2 Volume.
Why do you think that’s not a great idea ?

Bernd

On 04/02/2019 08:04 AM, berndgsflinux wrote:[color=blue]

I have the VM’s disk files (raw) on an OCFS2 Volume.
Why do you think that’s not a great idea ?[/color]

Without being an OCFS2 expert, and only know (or thinking I know) that it
is designed for Oracle DB use, and Oracle’s DB files are very different
from virtual machine disks, which can do anything the applications within
the disks want, from writing temporary files (big one, small ones),
growing or shrinking, etc. I believe OCFS2 is designed for multiple
writers at the same time, which is really different from what disks
(virtual or physical) do, as those expect a single writer (the kernel) and
expect a monopoly on the hardware. Virtual machines typically think their
disks are pretty dedicated, and fairly responsive, not encumbered by a lot
of contention with other things on a remote device over a network (sure,
LUNs in SANs are “remote”, but SANs are designed to handle that).

Maybe it’s fine, but it seems like a weird way to go. I’ve only ever used
OCFS2 for Oracle DB stuff, and even then not for several years, and it
seemed to work for that.


Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below.

If you want to send me a private message, please let me know in the
forum as I do not use the web interface often.

Hi Bernd,

I’ve dropped using OCFS2-based image files in favor of NPIV-based FC volumes a long time ago (and nowadays am using Ceph in a Openstack environment), but had experienced similar situations quite often in early days. It got much better in later versions (starting around SLES11 SP2, IIRC) and was very likely related to bugs in OCFS2. Usually, it was a problem with DLM (distributed lock manager).

If your servers are under support, I recommend opening a service request.

Best regards,
J

Hi Bernd,

I hope you tried have already tried checking any process in uninterruptible sleep(D- state).If so, tried to eliminate the pid (cautious) first. You may also kill the processes lingering on unreachable shared (like cifs or nfs etc.)after umounting (if possible). You can check the D process list with: ps aux |awk “{$8 ~ /D/}”

Generally if the load kept on increasing , its just due to stuck IO operations.

Hope it helps !

Regards,
loadsuse