Hi,
i have a two-node HA cluster, where one node has a high load (~ 30).
But no high cpu activity, also no waiting for IO.
[CODE]top - 21:52:28 up 3 days, 4:51, 18 users, load average: 29.58, 29.34, 29.37
Tasks: 384 total, 1 running, 383 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.7 us, 1.8 sy, 0.0 ni, 97.5 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem: 13189262+total, 13101358+used, 879040 free, 585960 buffers
KiB Swap: 13107200+total, 644608 used, 13042739+free. 12695278+cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7093 root 0 -20 27804 10868 3520 S 1.428 0.008 254:55.21 /usr/bin/atop -a -R -w /var/log/atop/atop_20190309 1
729 qemu 20 0 20.807g 1.067g 22876 S 0.455 0.848 184:58.00 /usr/bin/qemu-system-x86_64 -machine accel=kvm -name guest=mausdb,debug-threads=on -S -object secret,id=masterKey0,for+
774 qemu 20 0 7081016 1.001g 22528 S 0.041 0.796 12:04.02 /usr/bin/qemu-system-x86_64 -machine accel=kvm -name guest=sim,debug-threads=on -S -object secret,id=masterKey0,format+
1010 root 20 0 14348 2736 1984 S 0.041 0.002 24:10.42 top
18606 root 20 0 14364 2684 1956 R 0.041 0.002 0:01.50 top
30428 root 20 0 14364 2688 1936 S 0.041 0.002 24:35.08 top
[/CODE]
I don’t have any idea from where the high load comes. There is one script running much longer than expected.
But it’s doing nothing than waiting:
ha-idg-1:~ # strace -p 31757
Process 31757 attached
wait4(-1,
man 2 wait says the process waits for a child.
How can i find out for which child it is waiting or what the child is doing.
The script writes its output to a file lying on a ocfs2 partition. But i can’t have a look into this folder.
I can enter that directory, but a ls -la seems to be stuck.
That happens already the second time, so i think it’s not by accident.
I don’t have any clue what’s going on, so every help is welcome.
Bernd