after updateing SLES 11 SP2 to SP3 i have different servers, which take 100% CPU (1 Core) für kswapd0.
All servers are VM and running on same HyperV Cluster.
Some machines have this bug, some not and i don’t know why.
After reboot such a buggy system, everything runs fine for a few hours, in some cases for a few days.
All systems are up to date and using the same Kernel.
a few things I’d like to know about the situation:
How’s memory usage when kswapd hits the roof? (“free -m”)
Do you have swap space configured?
Does it go away after some time, all by itself?
Does purging the cache help? (“sync;echo 3 > /proc/sys/vm/drop_caches”)
Could you please report the actual kernel version(s) of the affected systems? This would be helpful if forwarding this case to our SUSE back-ends…
Do you have an active support subscription (not only updates - may you open support requests)? This may quickly turn into a case a support engineer ought to have a look at, since the actual cause can be one of several. (I’d rule out bad memory as it has hit several servers after the update, so this may be a question of proper memory tuning or even a kernel problem).
The usual conclusion for similar reports was that this can be caused by bad memory or a low memory situation.
The systems are running with 4gb memory and 2gb of swap partition.
Swap seems not to be needed, because memory usage at most time is under 50% inluded cache mem.
There is not more load ore some special activitiy which induces the problem.
I only can resolve it by rebooting for some hours, is never go away by itself.
But thanks für you advice by purging the cache …it works !!! But what does that mean for me??
I use Kernel 3.0.93-0.8-default and updated yesterday 1 machine to 3.0.101-0.8-default, but the problem still exists with this kernel version.
Of course we have active support subscriptions, but first i would have a look if this is a known bug in the community and hope to be able to fix it myself.
[QUOTE=tojanov;17753]
[…]But thanks für you advice by purging the cache …it works !!! But what does that mean for me??
[…] Of course we have active support subscriptions, but first i would have a look if this is a known bug in the community and hope to be able to fix it myself.[/QUOTE]
it might be an upstream kernel problem, the net is full of reports on this. I’ll ask my SuSE contact for advice, since I cannot tell which fixes/improvement from later kernels were back-ported to the SLES kernels, that’s something the developers will have to answer.
I’ll get back to you once I have a proper reply, but due to the holiday season, this may take a few days…
could you collect /proc/meminfo and /proc/vmstat while kswapd is at 100% CPU (taken every 1s or so)? This would help further diagnosing the effect. Is this by chance a NUMA machine?