kswapd0 takes 100% CPU

Hi there,

after updateing SLES 11 SP2 to SP3 i have different servers, which take 100% CPU (1 Core) für kswapd0.
All servers are VM and running on same HyperV Cluster.
Some machines have this bug, some not and i don’t know why.
After reboot such a buggy system, everything runs fine for a few hours, in some cases for a few days.

All systems are up to date and using the same Kernel.

Hi tojanov,

a few things I’d like to know about the situation:

  • How’s memory usage when kswapd hits the roof? (“free -m”)

  • Do you have swap space configured?

  • Does it go away after some time, all by itself?

  • Does purging the cache help? (“sync;echo 3 > /proc/sys/vm/drop_caches”)

  • Could you please report the actual kernel version(s) of the affected systems? This would be helpful if forwarding this case to our SUSE back-ends…

  • Do you have an active support subscription (not only updates - may you open support requests)? This may quickly turn into a case a support engineer ought to have a look at, since the actual cause can be one of several. (I’d rule out bad memory as it has hit several servers after the update, so this may be a question of proper memory tuning or even a kernel problem).

The usual conclusion for similar reports was that this can be caused by bad memory or a low memory situation.

Regards,
Jens

Hi Jens,

thanks for your quickly post.

The systems are running with 4gb memory and 2gb of swap partition.
Swap seems not to be needed, because memory usage at most time is under 50% inluded cache mem.

         total       used       free     shared    buffers     cached

Mem: 3825 1907 1918 0 248 1532
-/+ buffers/cache: 126 3699
Swap: 2055 0 2055

There is not more load ore some special activitiy which induces the problem.
I only can resolve it by rebooting for some hours, is never go away by itself.

But thanks für you advice by purging the cache …it works !!! But what does that mean for me??

I use Kernel 3.0.93-0.8-default and updated yesterday 1 machine to 3.0.101-0.8-default, but the problem still exists with this kernel version.

Of course we have active support subscriptions, but first i would have a look if this is a known bug in the community and hope to be able to fix it myself.

Hi tojanov,

[QUOTE=tojanov;17753]
[…]But thanks für you advice by purging the cache …it works !!! But what does that mean for me??

[…] Of course we have active support subscriptions, but first i would have a look if this is a known bug in the community and hope to be able to fix it myself.[/QUOTE]

it might be an upstream kernel problem, the net is full of reports on this. I’ll ask my SuSE contact for advice, since I cannot tell which fixes/improvement from later kernels were back-ported to the SLES kernels, that’s something the developers will have to answer.

I’ll get back to you once I have a proper reply, but due to the holiday season, this may take a few days…

Regards,
Jens

Hi tojanov,

could you collect /proc/meminfo and /proc/vmstat while kswapd is at 100% CPU (taken every 1s or so)? This would help further diagnosing the effect. Is this by chance a NUMA machine?

Regards,
Jens

Hi and sorry for my late reply.

here are the logs:

The HyperV hosts, where the SLES VM’s are running, use NUMA technology. But the problem only exists on a few of them.

Your PM quota is full :wink:

I can only select openSuse platform, if i would open a bug in the bugtracker. There is no SLES, is that right and should i chose this one?