High Load average on SLES 11 SP2

Hi ,

I have a SLES 11 SP2 server running on a HP Blade H/W , and it reports high load average , detailed o/p are as below, Please help me to sort out this issue.

H/W HP ProLiant BL465c G6

15:16:59 runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
15:17:29 0 272 6.00 6.01 6.05
15:17:59 0 272 6.00 6.01 6.05
15:18:29 0 272 6.00 6.01 6.05
15:18:59 0 272 6.00 6.01 6.05
15:19:29 0 272 6.00 6.01 6.05
15:19:59 0 272 6.00 6.01 6.05
Average: 0 272 6.00 6.01 6.05

15:23:25 CPU %user %nice %system %iowait %steal %idle
15:23:55 all 0.04 0.00 0.06 0.00 0.00 99.90
15:24:25 all 0.02 0.00 0.07 0.00 0.00 99.91
15:24:55 all 0.01 0.00 0.07 0.00 0.00 99.92
15:25:25 all 0.02 0.00 0.04 0.00 0.00 99.93
15:25:55 all 0.09 0.00 0.07 0.00 0.00 99.85
15:26:25 all 0.01 0.00 0.06 0.01 0.00 99.92
Average: all 0.03 0.00 0.06 0.00 0.00 99.90

15:36:11 CPU %user %nice %system %iowait %steal %idle
15:36:41 all 0.02 0.00 0.06 0.00 0.00 99.92
15:36:41 0 0.00 0.00 0.00 0.00 0.00 100.00
15:36:41 1 0.00 0.00 0.00 0.00 0.00 100.00
15:36:41 2 0.00 0.00 0.00 0.00 0.00 100.00
15:36:41 3 0.00 0.00 0.10 0.00 0.00 99.90
15:36:41 4 0.00 0.00 0.00 0.00 0.00 100.00
15:36:41 5 0.00 0.00 0.07 0.00 0.00 99.93
15:36:41 6 0.10 0.00 0.27 0.00 0.00 99.63
15:36:41 7 0.00 0.00 0.03 0.00 0.00 99.97

15:36:41 CPU %user %nice %system %iowait %steal %idle
15:37:11 all 0.00 0.00 0.07 0.01 0.00 99.92
15:37:11 0 0.00 0.00 0.00 0.00 0.00 100.00
15:37:11 1 0.00 0.00 0.03 0.00 0.00 99.97
15:37:11 2 0.00 0.00 0.00 0.03 0.00 99.97
15:37:11 3 0.00 0.00 0.03 0.00 0.00 99.97
15:37:11 4 0.00 0.00 0.00 0.00 0.00 100.00
15:37:11 5 0.00 0.00 0.07 0.00 0.00 99.93
15:37:11 6 0.07 0.00 0.33 0.00 0.00 99.60
15:37:11 7 0.00 0.00 0.07 0.00 0.00 99.93

I can also see that there are few process in D state

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
504 root 18 -2 4116 632 524 D 0 0.0 0:00.00 modprobe
505 root 18 -2 4332 628 524 D 0 0.0 0:00.00 modprobe
555 root 18 -2 4188 640 536 D 0 0.0 0:00.00 modprobe
557 root 18 -2 4124 644 536 D 0 0.0 0:00.00 modprobe
565 root 18 -2 4108 628 524 D 0 0.0 0:00.00 modprobe
572 root 18 -2 4156 644 536 D 0 0.0 0:00.00 modprobe
Total status D: 6

root 504 1 0 Aug15 ? 00:00:00 /sbin/modprobe -b pci:v00000E11d0000B203sv0000103Csd00003305bc08sc80i00
root 505 1 0 Aug15 ? 00:00:00 /sbin/modprobe -b pci:v00000E11d0000B204sv0000103Csd00003305bc08sc80i00
root 555 1 0 Aug15 ? 00:00:00 /sbin/modprobe -b pci:v0000103Cd00003302sv0000103Csd00003305bc0Csc07i01
root 556 1 0 Aug15 ? 00:06:35 /sbin/modprobe -b pci:v00001166d00000036sv0000103Csd00003110bc06sc04i00
root 557 1 0 Aug15 ? 00:00:00 /sbin/modprobe -b pci:v00001166d00000205sv00001166sd00000201bc06sc00i00
root 565 1 0 Aug15 ? 00:00:00 /sbin/modprobe -b pci:v00001022d00001203sv00000000sd00000000bc06sc00i00
root 572 1 0 Aug15 ? 00:00:00 /sbin/modprobe -b pci:v00001166d00000103sv00000000sd00000000bc06sc04i00
root 603 1 0 Aug15 ? 00:06:36 /sbin/modprobe -b pci:v00001166d00000104sv0000103Csd00003111bc06sc04i00
root 641 1 0 Aug15 ? 00:06:36 /sbin/modprobe -b pci:v00001166d00000104sv00000000sd00000000bc06sc04i00

Hi sureshpanicker,

[QUOTE=sureshpanicker;15464]Hi ,

I have a SLES 11 SP2 server running on a HP Blade H/W , and it reports high load average , detailed o/p are as below, Please help me to sort out this issue.

H/W HP ProLiant BL465c G6

15:16:59 runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
15:17:29 0 272 6.00 6.01 6.05
15:17:59 0 272 6.00 6.01 6.05
15:18:29 0 272 6.00 6.01 6.05
15:18:59 0 272 6.00 6.01 6.05
15:19:29 0 272 6.00 6.01 6.05
15:19:59 0 272 6.00 6.01 6.05
Average: 0 272 6.00 6.01 6.05

15:23:25 CPU %user %nice %system %iowait %steal %idle
15:23:55 all 0.04 0.00 0.06 0.00 0.00 99.90
15:24:25 all 0.02 0.00 0.07 0.00 0.00 99.91
15:24:55 all 0.01 0.00 0.07 0.00 0.00 99.92
15:25:25 all 0.02 0.00 0.04 0.00 0.00 99.93
15:25:55 all 0.09 0.00 0.07 0.00 0.00 99.85
15:26:25 all 0.01 0.00 0.06 0.01 0.00 99.92
Average: all 0.03 0.00 0.06 0.00 0.00 99.90

15:36:11 CPU %user %nice %system %iowait %steal %idle
15:36:41 all 0.02 0.00 0.06 0.00 0.00 99.92
15:36:41 0 0.00 0.00 0.00 0.00 0.00 100.00
15:36:41 1 0.00 0.00 0.00 0.00 0.00 100.00
15:36:41 2 0.00 0.00 0.00 0.00 0.00 100.00
15:36:41 3 0.00 0.00 0.10 0.00 0.00 99.90
15:36:41 4 0.00 0.00 0.00 0.00 0.00 100.00
15:36:41 5 0.00 0.00 0.07 0.00 0.00 99.93
15:36:41 6 0.10 0.00 0.27 0.00 0.00 99.63
15:36:41 7 0.00 0.00 0.03 0.00 0.00 99.97

15:36:41 CPU %user %nice %system %iowait %steal %idle
15:37:11 all 0.00 0.00 0.07 0.01 0.00 99.92
15:37:11 0 0.00 0.00 0.00 0.00 0.00 100.00
15:37:11 1 0.00 0.00 0.03 0.00 0.00 99.97
15:37:11 2 0.00 0.00 0.00 0.03 0.00 99.97
15:37:11 3 0.00 0.00 0.03 0.00 0.00 99.97
15:37:11 4 0.00 0.00 0.00 0.00 0.00 100.00
15:37:11 5 0.00 0.00 0.07 0.00 0.00 99.93
15:37:11 6 0.07 0.00 0.33 0.00 0.00 99.60
15:37:11 7 0.00 0.00 0.07 0.00 0.00 99.93

I can also see that there are few process in D state

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
504 root 18 -2 4116 632 524 D 0 0.0 0:00.00 modprobe
505 root 18 -2 4332 628 524 D 0 0.0 0:00.00 modprobe
555 root 18 -2 4188 640 536 D 0 0.0 0:00.00 modprobe
557 root 18 -2 4124 644 536 D 0 0.0 0:00.00 modprobe
565 root 18 -2 4108 628 524 D 0 0.0 0:00.00 modprobe
572 root 18 -2 4156 644 536 D 0 0.0 0:00.00 modprobe
Total status D: 6

root 504 1 0 Aug15 ? 00:00:00 /sbin/modprobe -b pci:v00000E11d0000B203sv0000103Csd00003305bc08sc80i00
root 505 1 0 Aug15 ? 00:00:00 /sbin/modprobe -b pci:v00000E11d0000B204sv0000103Csd00003305bc08sc80i00
root 555 1 0 Aug15 ? 00:00:00 /sbin/modprobe -b pci:v0000103Cd00003302sv0000103Csd00003305bc0Csc07i01
root 556 1 0 Aug15 ? 00:06:35 /sbin/modprobe -b pci:v00001166d00000036sv0000103Csd00003110bc06sc04i00
root 557 1 0 Aug15 ? 00:00:00 /sbin/modprobe -b pci:v00001166d00000205sv00001166sd00000201bc06sc00i00
root 565 1 0 Aug15 ? 00:00:00 /sbin/modprobe -b pci:v00001022d00001203sv00000000sd00000000bc06sc00i00
root 572 1 0 Aug15 ? 00:00:00 /sbin/modprobe -b pci:v00001166d00000103sv00000000sd00000000bc06sc04i00
root 603 1 0 Aug15 ? 00:06:36 /sbin/modprobe -b pci:v00001166d00000104sv0000103Csd00003111bc06sc04i00
root 641 1 0 Aug15 ? 00:06:36 /sbin/modprobe -b pci:v00001166d00000104sv00000000sd00000000bc06sc04i00[/QUOTE]

“load” is related on the number of processes waiting to be scheduled at a given point in time. Typically, “waiting for disk I/O to complete” or “other processes are blocking the processor(s)” are candidates - but neither are your active CPU numbers high, nor is the iowait percentage. On your system, you have 6 processes that (per your description) seem to be “hanging in state D” (uninterruptible sleep, usually IO. They cannot be scheduled for further execution) - which accounts for “6,0” in your load number. The remainder is caused by some other processes waiting to be scheduled for various reasons from time to time.

You might want to check what that hardware is that the modules are supposed to help with, and why loading the modules doesn’t complete. Anything in /var/log/messages or in the output of “dmesg”?

Regards,
Jens

Hi Jens,

firstly thanks a lot for your suggestions, As you said those D state processes were making the like miserable. We have identified the cause of the problem as the failure of loading hpilo modules which is associated with the rpm ‘hp-ilo-kmp-default-8.3.0_3.0.13_0.27-1’.

Solution : We have decided to remove this rpm as suse has built-in modules available for hp ilo.

Now we have removed the rpm and taken a reboot, everything looks fine and i’m happy :slight_smile:

Regards
Suresh.

Hi Suresh,

thanks for the feedback and the details on the root cause!

Regards,
Jens