[QUOTE=ab;39171]On 08/18/2017 07:14 PM, alpha754293 wrote:[color=blue]
Code:
aes@aes3:~> free
total used free shared buffers cached
Mem: 132066080 122653184 9412896 155376 24 84712440
-/+ buffers/cache: 37940720 94125360
Swap: 268437500 35108 268402392
--------------------[/color]
This is showing what we would hope, that swap is basically unused. Sure,
35 MiB are being used, but that’s just about nothing, and it is probably
only data which should be swapped, like libraries loaded once and never
needed again but still needing to be loaded. You could tune swappiness
further, but I can hardly imagine it will make a big difference since the
system does not need the memory that is even completely free (9 GiB of it)
or that is used and freeable by cache (94 GiB).
[color=blue]
(I think that you meant /proc/sys/vm/swappiness and that is still at the
default value of 60.)[/color]
Change that if you want; sixty (60) is the default I have as well on my
boxes that I have not tuned, but again I doubt it matters too much since
the system isn’t using almost any swap currently now that xorg is not
trying to use all of the virtual memory the system has available.[/quote]
Xorg isn’t using it, but cache is (pagecache and slab objects) - 81.74 GiB of it to be precise.
So when an application is going to make a request for ca. 70 GiB of RAM, let’s say, and since the system only has 128 GB installed, it’s going to push any new demands on the RAM into swap and this is where it becomes a problem.
See below for further commentary re: swappiness.
[quote=ab][color=blue]
These were screenshots that I took of the terminal window (ssh) earlier.
You can see that on one system, it was caching 80.77 GiB and the other,
I was caching 94.83 GiB.
This is confirmed because when I run:
Code:
echo 3 > /proc/sys/vm/drop_caches
it clears the cache up right away.[/color]
Yes, that makes sense, but I do not understand why there is a perceived
problem considering the system state now that xorg is stopped. The system
is not in need of memory, at least not at the time of the snapshot you took.[/quote]
Again, the root cause of the issue actually isn’t the swap in and of itself. It first manifested as such, especially with X running, but in run level 3, I was able to find out that the root cause of the issue is due to the OS kernel’s vm caching of pagecache and slab objects.
That is the heart of the issue.
If the comment re: perceived problem was re: X running, then okay, sure. But if the comment re: perceived problem is that this is doing it at all, then it isn’t a perceived problem. It is a real and legitimate problem, and again, the root cause of it is the virtual memory management portion of the kernel that manages pagecache and slab objects.
Linux marks the RAM that pagecache and slab objects that are cached into as being RAM that is used (which is TECHNICALLY true). What it DOESN’T do when an application demands the RAM though is that it won’t release the cache a la (# echo 3 > /proc/sys/vm/drop_caches) in order to release the cached pagecache and slab objects back to the free memory pool so that it can then be used for/by a USER application.
THAT is the part that it DOESN’T seem to do/be doing.
And that is, to be blunt and frank - stupid.
If you have user applications that require RAM, it should take precedence over the OS’ need/desire to cache pagecache and slab objects.
If there is an underlying performance issue such that it is SIGNIFICANTLY slower for the OS to load those objects WITHOUT it having been cached into RAM first, then you should be fixing THAT as the root cause of the issue, and NOT “masking” it by making the caching of pagecache and slab objects having an apparent HIGHER priority ABOVE user apps.
That is just dumb.
WHYYYY would you architect a system like that?
Yes, I realise that to Linux, it thinks cached objects in RAM = RAM is in USE but it should be intelligent enough to know what it is TRULY being used vs. what’s only cached so that the cache can be cleared and the subsequent memory/RAM is released back into the free/available pool so that user apps can use it.
THAT is the root cause of the underlying issue.
[quote=ab][color=blue]
So, yes, Xorg was for some reason taking up a lot of RAM (still not
really sure why) so the technical support team from SuSE suggested that
I switch over to run level 3 to see if this is still happening, and it
is.[/color]
I think it may be useful to go back to runlevel five (5) again but without
running anything in it; I suspect it will not show the symptom unless you
run your program in there again. Something about that is probably causing
the memory leak, and thus asking the system for all possible memory to the
detriment of everything. Eventually I would guess the kernel would use
the Out Of Memory (OOM) killer to kill xorg in order to free up what is
obviously the biggest memory consumer.[/quote]
I disagree.
The console output of “free” actually tells you that on one of the nodes, it has cached 81.74 GiB of objects and the other has cached well…it WAS 94.83 GiB, now it is 116.31 GiB.
Here is the output of ps aux for that node:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 116160 4960 ? Ss Aug15 0:13 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
root 2 0.0 0.0 0 0 ? S Aug15 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S Aug15 0:02 [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/0:0H]
root 7 0.0 0.0 0 0 ? S Aug15 0:30 [kworker/u33:0]
root 8 0.0 0.0 0 0 ? S Aug15 0:03 [migration/0]
root 9 0.0 0.0 0 0 ? S Aug15 0:00 [rcu_bh]
root 10 0.0 0.0 0 0 ? S Aug15 4:47 [rcu_sched]
root 11 0.0 0.0 0 0 ? S Aug15 0:00 [watchdog/0]
root 12 0.0 0.0 0 0 ? S Aug15 0:00 [watchdog/1]
root 13 0.0 0.0 0 0 ? S Aug15 0:00 [migration/1]
root 14 0.0 0.0 0 0 ? S Aug15 0:00 [ksoftirqd/1]
root 16 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/1:0H]
root 17 0.0 0.0 0 0 ? S Aug15 0:00 [watchdog/2]
root 18 0.0 0.0 0 0 ? S Aug15 0:00 [migration/2]
root 19 0.0 0.0 0 0 ? S Aug15 0:00 [ksoftirqd/2]
root 21 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/2:0H]
root 22 0.0 0.0 0 0 ? S Aug15 0:00 [watchdog/3]
root 23 0.0 0.0 0 0 ? S Aug15 0:00 [migration/3]
root 24 0.0 0.0 0 0 ? S Aug15 0:00 [ksoftirqd/3]
root 26 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/3:0H]
root 27 0.0 0.0 0 0 ? S Aug15 0:00 [watchdog/4]
root 28 0.0 0.0 0 0 ? S Aug15 0:00 [migration/4]
root 29 0.0 0.0 0 0 ? S Aug15 0:00 [ksoftirqd/4]
root 31 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/4:0H]
root 32 0.0 0.0 0 0 ? S Aug15 0:00 [watchdog/5]
root 33 0.0 0.0 0 0 ? S Aug15 0:00 [migration/5]
root 34 0.0 0.0 0 0 ? S Aug15 0:00 [ksoftirqd/5]
root 36 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/5:0H]
root 37 0.0 0.0 0 0 ? S Aug15 0:00 [watchdog/6]
root 38 0.0 0.0 0 0 ? S Aug15 0:00 [migration/6]
root 39 0.0 0.0 0 0 ? S Aug15 0:00 [ksoftirqd/6]
root 41 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/6:0H]
root 42 0.0 0.0 0 0 ? S Aug15 0:00 [watchdog/7]
root 43 0.0 0.0 0 0 ? S Aug15 0:00 [migration/7]
root 44 0.0 0.0 0 0 ? S Aug15 0:00 [ksoftirqd/7]
root 46 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/7:0H]
root 47 0.0 0.0 0 0 ? S Aug15 0:00 [watchdog/8]
root 48 0.0 0.0 0 0 ? S Aug15 0:04 [migration/8]
root 49 0.0 0.0 0 0 ? S Aug15 0:02 [ksoftirqd/8]
root 51 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/8:0H]
root 53 0.0 0.0 0 0 ? S Aug15 0:00 [watchdog/9]
root 54 0.0 0.0 0 0 ? S Aug15 0:00 [migration/9]
root 55 0.0 0.0 0 0 ? S Aug15 0:01 [ksoftirqd/9]
root 57 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/9:0H]
root 58 0.0 0.0 0 0 ? S Aug15 0:00 [watchdog/10]
root 59 0.0 0.0 0 0 ? S Aug15 0:00 [migration/10]
root 60 0.0 0.0 0 0 ? S Aug15 0:00 [ksoftirqd/10]
root 62 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/10:0H]
root 63 0.0 0.0 0 0 ? S Aug15 0:00 [watchdog/11]
root 64 0.0 0.0 0 0 ? S Aug15 0:00 [migration/11]
root 65 0.0 0.0 0 0 ? S Aug15 0:00 [ksoftirqd/11]
root 67 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/11:0H]
root 68 0.0 0.0 0 0 ? S Aug15 0:00 [watchdog/12]
root 69 0.0 0.0 0 0 ? S Aug15 0:00 [migration/12]
root 70 0.0 0.0 0 0 ? S Aug15 0:00 [ksoftirqd/12]
root 72 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/12:0H]
root 73 0.0 0.0 0 0 ? S Aug15 0:00 [watchdog/13]
root 74 0.0 0.0 0 0 ? S Aug15 0:00 [migration/13]
root 75 0.0 0.0 0 0 ? S Aug15 0:00 [ksoftirqd/13]
root 77 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/13:0H]
root 78 0.0 0.0 0 0 ? S Aug15 0:00 [watchdog/14]
root 79 0.0 0.0 0 0 ? S Aug15 0:00 [migration/14]
root 80 0.0 0.0 0 0 ? S Aug15 0:00 [ksoftirqd/14]
root 82 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/14:0H]
root 83 0.0 0.0 0 0 ? S Aug15 0:00 [watchdog/15]
root 84 0.0 0.0 0 0 ? S Aug15 0:00 [migration/15]
root 85 0.0 0.0 0 0 ? S Aug15 0:00 [ksoftirqd/15]
root 87 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/15:0H]
root 88 0.0 0.0 0 0 ? S< Aug15 0:00 [khelper]
root 89 0.0 0.0 0 0 ? S Aug15 0:00 [kdevtmpfs]
root 90 0.0 0.0 0 0 ? S< Aug15 0:00 [netns]
root 91 0.0 0.0 0 0 ? S< Aug15 0:00 [perf]
root 92 0.0 0.0 0 0 ? S< Aug15 0:00 [writeback]
root 93 0.0 0.0 0 0 ? S< Aug15 0:00 [kintegrityd]
root 94 0.0 0.0 0 0 ? S< Aug15 0:00 [bioset]
root 95 0.0 0.0 0 0 ? S< Aug15 0:00 [crypto]
root 96 0.0 0.0 0 0 ? S< Aug15 0:00 [kblockd]
root 101 0.0 0.0 0 0 ? S< Aug15 0:00 [kgraft]
root 102 0.0 0.0 0 0 ? S Aug15 0:00 [khungtaskd]
root 104 0.0 0.0 0 0 ? S Aug15 0:06 [kswapd0]
root 105 0.0 0.0 0 0 ? S Aug15 0:02 [kswapd1]
root 106 0.0 0.0 0 0 ? SN Aug15 0:00 [ksmd]
root 107 0.0 0.0 0 0 ? SN Aug15 0:00 [khugepaged]
root 108 0.0 0.0 0 0 ? S Aug15 0:00 [fsnotify_mark]
root 118 0.0 0.0 0 0 ? S< Aug15 0:00 [kthrotld]
root 128 0.0 0.0 0 0 ? S< Aug15 0:00 [kpsmoused]
root 129 0.0 0.0 0 0 ? S Aug15 0:00 [print/0]
root 130 0.0 0.0 0 0 ? S Aug15 0:00 [print/1]
root 150 0.0 0.0 0 0 ? S< Aug15 0:00 [deferwq]
root 151 0.0 0.0 0 0 ? S Aug15 0:21 [kworker/14:1]
root 188 0.0 0.0 0 0 ? S Aug15 0:00 [kauditd]
root 301 0.0 0.0 0 0 ? S< Aug15 0:00 [ata_sff]
root 304 0.0 0.0 0 0 ? S Aug15 0:00 [khubd]
root 310 0.0 0.0 0 0 ? S< Aug15 0:00 [ttm_swap]
root 348 0.0 0.0 0 0 ? S Aug15 0:00 [scsi_eh_0]
root 349 0.0 0.0 0 0 ? S< Aug15 0:00 [scsi_tmf_0]
root 350 0.0 0.0 0 0 ? S< Aug15 0:00 [scsi_wq_0]
root 352 0.0 0.0 0 0 ? S Aug15 0:00 [scsi_eh_1]
root 353 0.0 0.0 0 0 ? S< Aug15 0:00 [scsi_tmf_1]
root 354 0.0 0.0 0 0 ? S Aug15 0:00 [scsi_eh_2]
root 355 0.0 0.0 0 0 ? S< Aug15 0:00 [scsi_tmf_2]
root 356 0.0 0.0 0 0 ? S Aug15 0:00 [scsi_eh_3]
root 357 0.0 0.0 0 0 ? S< Aug15 0:00 [scsi_tmf_3]
root 358 0.0 0.0 0 0 ? S Aug15 0:00 [scsi_eh_4]
root 359 0.0 0.0 0 0 ? S< Aug15 0:00 [scsi_tmf_4]
root 360 0.0 0.0 0 0 ? S Aug15 0:00 [scsi_eh_5]
root 361 0.0 0.0 0 0 ? S< Aug15 0:00 [scsi_tmf_5]
root 362 0.0 0.0 0 0 ? S Aug15 0:00 [scsi_eh_6]
root 363 0.0 0.0 0 0 ? S< Aug15 0:00 [scsi_tmf_6]
root 388 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/0:1H]
root 389 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/11:1H]
root 390 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/14:1H]
root 391 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/15:1H]
root 392 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/8:1H]
root 397 0.0 0.0 0 0 ? S< Aug15 0:00 [bioset]
root 398 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/9:1H]
root 410 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/10:1H]
root 417 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-genwork-1]
root 418 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-submit-1]
root 419 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-delalloc-]
root 420 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-fixup-1]
root 422 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-endio-met]
root 423 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-rmw-1]
root 424 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-endio-rai]
root 425 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-endio-met]
root 427 0.0 0.0 0 0 ? S Aug15 0:01 [btrfs-freespace]
root 428 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-delayed-m]
root 429 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-cache-1]
root 430 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-readahead]
root 431 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-flush_del]
root 432 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-qgroup-re]
root 433 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/12:1H]
root 434 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/13:1H]
root 435 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-cleaner]
root 436 0.0 0.0 0 0 ? S Aug15 0:35 [btrfs-transacti]
root 525 0.0 0.0 43400 9568 ? SLs Aug15 0:02 /usr/lib/systemd/systemd-journald
root 533 0.0 0.0 21732 960 ? Ss Aug15 0:14 /sbin/dmeventd -f
root 539 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-endio-2]
root 545 0.0 0.0 0 0 ? S Aug18 0:05 [kworker/10:2]
root 557 0.0 0.0 42940 2436 ? Ss Aug15 0:00 /usr/lib/systemd/systemd-udevd
root 759 0.0 0.0 0 0 ? S< Aug15 0:00 [edac-poller]
root 760 0.0 0.0 12032 3792 ? Ss Aug15 0:39 /usr/sbin/haveged -w 1024 -v 0 -F
root 925 0.0 0.0 0 0 ? S< Aug15 0:00 [kvm-irqfd-clean]
root 1805 0.0 0.0 0 0 ? SN Aug15 0:00 [kipmi0]
root 1843 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-genwork-1]
root 1844 0.0 0.0 0 0 ? S Aug15 0:02 [btrfs-submit-1]
root 1845 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-delalloc-]
root 1846 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-fixup-1]
root 1847 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-endio-1]
root 1848 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-endio-met]
root 1849 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-rmw-1]
root 1850 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-endio-rai]
root 1851 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-endio-met]
root 1852 0.0 0.0 0 0 ? S Aug15 0:04 [btrfs-endio-wri]
root 1853 0.0 0.0 0 0 ? S Aug15 0:01 [btrfs-freespace]
root 1854 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-delayed-m]
root 1855 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-cache-1]
root 1856 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-readahead]
root 1857 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-flush_del]
root 1858 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-qgroup-re]
root 1866 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-cleaner]
root 1867 0.0 0.0 0 0 ? S Aug15 0:33 [btrfs-transacti]
message+ 2073 0.0 0.0 42920 2924 ? SLs Aug15 0:03 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
avahi 2076 0.0 0.0 20076 1724 ? Ss Aug15 0:24 avahi-daemon: running [aes2.local]
root 2078 0.0 0.0 24964 2708 ? Ss Aug15 0:00 /usr/sbin/smartd -n
root 2081 0.0 0.0 19304 1232 ? Ss Aug15 0:42 /usr/sbin/irqbalance --foreground
nscd 2084 0.0 0.0 802360 1456 ? Ssl Aug15 0:02 /usr/sbin/nscd
root 2085 0.0 0.0 29488 3184 ? SLs Aug15 0:00 /usr/lib/wicked/bin/wickedd-dhcp6 --systemd --foreground
root 2088 0.0 0.0 29488 3424 ? SLs Aug15 0:00 /usr/lib/wicked/bin/wickedd-dhcp4 --systemd --foreground
root 2099 0.0 0.0 29488 3188 ? SLs Aug15 0:00 /usr/lib/wicked/bin/wickedd-auto4 --systemd --foreground
root 2121 0.0 0.0 20096 1584 ? Ss Aug15 0:02 /usr/lib/systemd/systemd-logind
root 2122 0.0 0.0 4440 768 tty1 Ss+ Aug15 0:00 /sbin/agetty --noclear tty1 linux
root 2126 0.0 0.0 0 0 ? S 17:45 0:00 [kworker/0:0]
root 2149 0.0 0.0 337852 2484 ? SLsl Aug15 0:00 /usr/sbin/rsyslogd -n
root 2151 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/4:1H]
root 2152 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/6:1H]
root 2156 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/5:1H]
root 2158 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/2:1H]
root 2159 0.0 0.0 29612 3568 ? SLs Aug15 0:00 /usr/sbin/wickedd --systemd --foreground
root 2169 0.0 0.0 29516 3280 ? SLs Aug15 0:00 /usr/sbin/wickedd-nanny --systemd --foreground
root 2207 0.0 0.0 0 0 ? S Aug15 0:00 [kworker/u34:1]
root 2322 0.0 0.0 0 0 ? S< Aug15 0:00 [kworker/3:1H]
root 2991 0.0 0.0 437608 15460 ? SLsl Aug15 0:00 /usr/sbin/libvirtd --listen
root 3022 0.0 0.0 46896 3136 ? Ss Aug15 0:00 /usr/sbin/sshd -D
root 3100 0.0 0.0 19608 1212 ? Ss Aug15 0:01 /usr/lib/postfix/master -w
postfix 3102 0.0 0.0 21856 2268 ? S Aug15 0:00 qmgr -l -t fifo -u
root 3129 0.0 0.0 18820 1492 ? Ss Aug15 0:00 /usr/sbin/cron -n
root 3472 0.0 0.0 0 0 ? S Aug19 0:00 [kworker/u32:1]
root 3538 0.6 0.1 812836 141660 ? Ssl Aug15 47:40 /usr/ansys_inc/shared_files/licensing/linx64/ansysli_server
root 3550 0.0 0.0 19064 3592 ? Ss Aug15 1:58 /usr/ansys_inc/shared_files/licensing/linx64/ansysli_monitor -monitor 3538 -restart_port_timeout 15
root 3584 0.0 0.0 16780 2692 ? S Aug15 0:04 /usr/ansys_inc/shared_files/licensing/linx64/lmgrd -c /usr/ansys_inc/shared_files/licensing/license_files -l /usr/ansys_inc/shared_files/licensing/license.log
root 3585 0.0 0.0 127816 7376 ? Ssl Aug15 0:32 ansyslmd -T aes2 11.13 3 -c :/usr/ansys_inc/shared_files/licensing/license_files: -srv LgFV2wwMa2iyCVChj6LuclIeIH7uSthmAgsCjVUTJXsEnEgIdOrnsb832BA3Cnw --lmgrd_start 5993c175 -vdrestart 0
root 4180 0.0 0.0 0 0 ? S 18:00 0:00 [kworker/7:0]
root 5142 0.0 0.0 0 0 ? S< Aug16 0:00 [kworker/1:1H]
root 5601 0.0 0.0 0 0 ? S 10:00 0:00 [kworker/1:0]
root 5943 0.0 0.0 0 0 ? S Aug15 0:00 [btrfs-worker-4]
root 6095 0.0 0.0 0 0 ? S< Aug16 0:00 [kworker/7:1H]
root 6123 0.0 0.0 0 0 ? S 18:15 0:00 [kworker/9:1]
root 6960 0.0 0.0 0 0 ? S 14:15 0:00 [kworker/6:0]
root 6961 0.0 0.0 0 0 ? S Aug17 0:11 [kworker/15:2]
root 7459 0.0 0.0 0 0 ? S Aug15 0:05 [kworker/u34:2]
root 7532 0.0 0.0 87676 4088 ? Ss Aug15 0:00 sshd: ewen [priv]
ewen 7541 0.0 0.0 87676 1780 ? S Aug15 0:00 sshd: ewen@pts/0
ewen 7542 0.0 0.0 14316 3204 pts/0 Ss Aug15 0:00 -bash
root 7848 0.0 0.0 87676 4088 ? Ss Aug16 0:00 sshd: ewen [priv]
ewen 7857 0.0 0.0 87676 1780 ? S Aug16 0:00 sshd: ewen@pts/1
ewen 7858 0.0 0.0 14316 3228 pts/1 Ss Aug16 0:00 -bash
root 8117 0.0 0.0 0 0 ? S 18:30 0:00 [kworker/11:1]
root 8120 0.0 0.0 0 0 ? S 18:30 0:00 [kworker/2:1]
root 8904 0.0 0.0 0 0 ? S 14:30 0:01 [kworker/1:1]
root 9949 0.0 0.0 0 0 ? S Aug16 0:00 [kworker/14:2]
root 10053 0.0 0.0 0 0 ? S 18:45 0:00 [kworker/8:0]
postfix 10362 0.0 0.0 21464 1376 ? S 18:47 0:00 pickup -l -t fifo -u
root 10405 0.0 0.0 0 0 ? S Aug19 0:04 [kworker/15:0]
root 10968 0.0 0.0 0 0 ? S 18:51 0:00 [btrfs-endio-wri]
root 12052 0.0 0.0 0 0 ? S 19:00 0:00 [kworker/9:2]
ewen 12054 0.0 0.0 118380 80776 pts/0 S+ Aug16 0:02 /usr/ansys_inc/v180/CFX/bin/../tools/perl-5.8.0-1/bin/Linux-x86_64/perl -Sx /usr/ansys_inc/v180/CFX/bin/cfx5solve -batch -par-local -part 16 -def Transient.def
ewen 12190 0.3 0.0 14368 1328 pts/0 S+ Aug16 21:15 /usr/ansys_inc/v180/commonfiles/MPI/IBM/9.1.4.2/linx64/bin/mpirun -f /export/home/work/Aerosmart/International 9200i 2006 (solid model) flow regime (wo bubble) CFX mesh2a_files/dp0/CFX-1/CFX/Transient_001.dir/appfile
ewen 12193 0.2 0.0 48240 1692 pts/0 S+ Aug16 18:12 /usr/ansys_inc/v180/commonfiles/MPI/IBM/9.1.4.2/linx64/bin/mpid 0 0 151061506 192.168.1.157 30519 12190 /usr/ansys_inc/v180/commonfiles/MPI/IBM/9.1.4.2/linx64
ewen 12288 99.6 3.0 4422964 4068788 pts/0 Rl+ Aug16 6892:24 /usr/ansys_inc/v180/CFX/bin/linux-amd64/ifort/solver-mpi.exe -par -pri 2 -outopt 0 -nojob
ewen 12289 99.8 0.2 474100 366268 pts/0 R+ Aug16 6904:11 /usr/ansys_inc/v180/CFX/bin/linux-amd64/ifort/solver-mpi.exe -par -pri 2 -outopt 0 -nojob
ewen 12290 99.8 0.2 464608 356372 pts/0 R+ Aug16 6904:49 /usr/ansys_inc/v180/CFX/bin/linux-amd64/ifort/solver-mpi.exe -par -pri 2 -outopt 0 -nojob
ewen 12291 99.7 0.2 456024 345484 pts/0 R+ Aug16 6897:07 /usr/ansys_inc/v180/CFX/bin/linux-amd64/ifort/solver-mpi.exe -par -pri 2 -outopt 0 -nojob
ewen 12292 99.7 0.2 473756 363616 pts/0 R+ Aug16 6898:48 /usr/ansys_inc/v180/CFX/bin/linux-amd64/ifort/solver-mpi.exe -par -pri 2 -outopt 0 -nojob
ewen 12293 99.7 0.2 473668 358092 pts/0 R+ Aug16 6898:28 /usr/ansys_inc/v180/CFX/bin/linux-amd64/ifort/solver-mpi.exe -par -pri 2 -outopt 0 -nojob
ewen 12294 99.7 0.2 463792 352588 pts/0 R+ Aug16 6899:14 /usr/ansys_inc/v180/CFX/bin/linux-amd64/ifort/solver-mpi.exe -par -pri 2 -outopt 0 -nojob
ewen 12295 99.8 0.2 454192 346016 pts/0 R+ Aug16 6904:35 /usr/ansys_inc/v180/CFX/bin/linux-amd64/ifort/solver-mpi.exe -par -pri 2 -outopt 0 -nojob
ewen 12296 99.7 0.2 458876 349376 pts/0 R+ Aug16 6897:50 /usr/ansys_inc/v180/CFX/bin/linux-amd64/ifort/solver-mpi.exe -par -pri 2 -outopt 0 -nojob
ewen 12297 99.7 0.2 460728 351572 pts/0 R+ Aug16 6898:57 /usr/ansys_inc/v180/CFX/bin/linux-amd64/ifort/solver-mpi.exe -par -pri 2 -outopt 0 -nojob
ewen 12298 99.8 0.2 461312 353512 pts/0 R+ Aug16 6904:47 /usr/ansys_inc/v180/CFX/bin/linux-amd64/ifort/solver-mpi.exe -par -pri 2 -outopt 0 -nojob
ewen 12299 99.8 0.2 458340 347260 pts/0 R+ Aug16 6903:56 /usr/ansys_inc/v180/CFX/bin/linux-amd64/ifort/solver-mpi.exe -par -pri 2 -outopt 0 -nojob
ewen 12300 99.8 0.2 475320 362708 pts/0 R+ Aug16 6903:35 /usr/ansys_inc/v180/CFX/bin/linux-amd64/ifort/solver-mpi.exe -par -pri 2 -outopt 0 -nojob
ewen 12301 99.8 0.2 476804 367708 pts/0 R+ Aug16 6902:37 /usr/ansys_inc/v180/CFX/bin/linux-amd64/ifort/solver-mpi.exe -par -pri 2 -outopt 0 -nojob
ewen 12302 99.8 0.2 473872 364996 pts/0 R+ Aug16 6903:42 /usr/ansys_inc/v180/CFX/bin/linux-amd64/ifort/solver-mpi.exe -par -pri 2 -outopt 0 -nojob
ewen 12303 99.7 0.2 478276 368140 pts/0 R+ Aug16 6900:52 /usr/ansys_inc/v180/CFX/bin/linux-amd64/ifort/solver-mpi.exe -par -pri 2 -outopt 0 -nojob
root 14379 0.0 0.0 0 0 ? S 19:17 0:00 [btrfs-worker-2]
root 15671 0.0 0.0 0 0 ? S Aug19 0:04 [kworker/10:1]
root 16017 0.0 0.0 0 0 ? S 19:30 0:00 [kworker/8:2]
root 16864 0.0 0.0 0 0 ? S 15:30 0:00 [kworker/u32:2]
root 17085 0.0 0.0 0 0 ? S Aug18 0:02 [kworker/11:2]
root 17146 0.0 0.0 0 0 ? S Aug19 0:05 [kworker/12:2]
root 17956 0.0 0.0 0 0 ? S 19:45 0:00 [kworker/5:2]
root 17959 0.0 0.0 0 0 ? S 19:45 0:00 [kworker/4:2]
root 18147 0.2 0.0 0 0 ? S 19:46 0:00 [btrfs-worker-2]
root 18148 0.2 0.0 0 0 ? S 19:46 0:00 [btrfs-worker-3]
root 18149 0.1 0.0 0 0 ? S 19:46 0:00 [btrfs-worker-4]
root 18150 0.1 0.0 0 0 ? S 19:46 0:00 [btrfs-worker-5]
root 18151 0.1 0.0 0 0 ? S 19:46 0:00 [btrfs-worker-6]
root 18152 0.0 0.0 0 0 ? S 19:46 0:00 [btrfs-worker-7]
root 18153 0.0 0.0 0 0 ? S 19:46 0:00 [btrfs-worker-8]
ewen 18249 57.1 0.0 26696 1864 pts/1 RL+ 19:46 0:00 ps aux
root 18790 0.0 0.0 0 0 ? S 15:45 0:00 [kworker/2:0]
root 20277 0.0 0.0 0 0 ? S Aug19 0:00 [kworker/12:0]
root 22717 0.0 0.0 0 0 ? S 16:15 0:00 [kworker/0:1]
root 23515 0.0 0.0 0 0 ? S 12:15 0:01 [kworker/7:2]
root 23790 0.0 0.0 0 0 ? S Aug18 0:08 [kworker/13:1]
root 24125 0.0 0.0 0 0 ? S 08:15 0:01 [kworker/5:0]
root 26351 0.0 0.0 0 0 ? S Aug18 0:00 [kworker/u33:2]
root 26967 0.0 0.0 0 0 ? S Aug19 0:01 [kworker/5:1]
root 27446 0.0 0.0 0 0 ? S 12:45 0:00 [kworker/3:2]
root 27935 0.0 0.0 0 0 ? S Aug16 0:00 [kworker/13:2]
root 28047 0.0 0.0 0 0 ? S 08:45 0:02 [kworker/4:0]
root 28649 0.0 0.0 0 0 ? S 17:00 0:00 [kworker/6:1]
root 30856 0.0 0.0 0 0 ? S 05:00 0:00 [kworker/4:1]
root 31394 0.0 0.0 0 0 ? S 13:15 0:01 [kworker/3:0]
$ cat /proc/sys/vm/vfs_cache_pressure
200
$ cat /proc/sys/vm/swappiness
60
I highly doubt 116.31 GiB of cached objects is a “perceived” problem.
[quote=ab][color=blue]
I found this site:
http://www.linuxatemyram.com/
The table near the bottom which shows that cached objects in RAM as
being used RAM and so for some strange reason, someone, in their
infinite wisdom, decided to denote cached objects (pagecache and slab
objects) as such and so, the memory manager (vm) WON’T release the RAM
via a cache purge automatically which means that large memory programs
like scientific/engineering analysis programs would end up needing to
swap but not enough RAM is free/available.
Your last statement is correct.
I think that even when you install SLES, if you try to install it
without swap, it gives you a warning against doing something like that.[/color]
Yes, it does, and I thin it does because nobody has bothered to remove it,
and also because in general having a little (2 GiB is standard I think)
swap partition just in case.
[color=blue]
Being that I also come from a predominantly Windows background (moreso
than Linux or Solaris), I know that Windows REALLY hates it if it
doesn’t have swap available, even with large memory systems.
I’m not sure if Linux will behave “appropriately” in the absence of a
swap partition.[/color]
Every box I’ve built in the past several years (ten or so?) has had
minimal or no swap at all, except for my laptop where I have a swap simply
because I hibernate (suspend to disk) often. Probably half of the box
I’ve setup in that time have had no swap at all and are all either
running, or been retired and replaced because of hardware dying. I
stopped using swap, as much as possible, years ago because of the issues
I’ve mentioned before where the performance suffered when some program
(xorg in your case) gets out of control made using the system, even to
kill that problematic process, too much of a problem to stand. Anyway,
Linux seems to do fine, particularly if you want to tune the swappiness
down to one (1) or zero (0) since that should mean swap is only used as a
last resort; my laptop has it set to zero (0).
[color=blue]
Conversely, nowadays with PCI Express SSDs, swapping isn’t quite as big
of a deal as it once was. Still sucks, but it’s MUCH better than the
days of asynchronous swapping on a mechanically rotating hard drive at
3-5 MB/s. (I don’t have my PCIe SSD installed in it right now. I wanted
to see how well it would do without it in case I end up installing an
Infiniband card instead.)[/color]
Sure, but 250+ GiB of it? If your system needs that much swap, even if
you are using SSDs, something is amiss and writing at one GB/s is still
not that fast compared to what RAM can do. I just did a test on a REALLY
old box that was never server-class hardware and even it can write to RAM
at 3 GiB/s and it would probably take striped SSDs today to keep up with
that, or modern RAM should be able to go much faster, maybe an order of
magnitude or more.[/quote]
So…yes and no.
I keep the swap around only because in the cases that I am currently running, I know how much memory they take to run, but that’s generally NOT the case (i.e. I don’t have a good estimate of how much memory it will need before I submit the analysis run/job.)
The system has been spec’d in anticipation of larger memory runs, but there is also a possibility that even 128 GB will be insufficient. Yay engineering?
So I upgraded the RAM to a cost-effective solution - it isn’t the most RAM that it can take, but it also isn’t the lowest cost either.
In my case, swap exists in the event of an analysis requiring more memory than is physically available.
[quote=ab][color=blue]
To me, /proc/sys/vm/swappiness tells the system how “often” it swaps.
What I really want it to do isn’t to swap, but it is to clear the cache
so that it will free up the memory so that swap wouldn’t even be an
issue. That’s what I was really going for by writing to
/proc/sys/vm/vfs_cache_pressure (which, since then, I’ve also edited
sysctl so that it will be a more permanent change instead, but I am
manually writing to that now to test out which setting will work the way
I would want/like it to).[/color]
Set swappiness to zero (0) and it will only be used if really needed; the
system may still use cache, and in my testing it frees that at the drop of
a hat for any user process that needs it (‘root’-owned or otherwise), but
at least you will never have swap be used until that time. My laptop runs
VMs, monster Java processes, this Thunderbird tool (subscribed to a
hundred groups and a half-dozen e-mail accounts), Firefox (with a dozen
tabs), Tomboy with a few thousand notes, Pidgin, konsole with a dozen tabs
and infinite scollback, and as much else as I can throw at it, and it
never uses swap unless a process runs away, at which time I still wish it
were dead despite a decent, but not new/modern, SSD.
I need to read up more on the vfs_cache_pressure stuff, but ultimately I
would take your system, cut back the swap as much as possible (you’re into
high performance, to use high-performance (non-swap) memory only by sizing
for it as you have done), and then keep xorg from building up memory by
not running that program within it, or not running it at all (it is a
server after all; you do not need a GUI full time), but that is just me.
It is interesting hearing about your experience, and what you are seeing,
and particularly in this little HPC environment so, no matter which route
you choose, thank-you for sharing what you have so far.
–
Good luck.
If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below.
If you want to send me a private message, please let me know in the
forum as I do not use the web interface often.[/QUOTE]
You’re welcome.
Yeah, I can try changing the swappiness to zero as well.
It was my understanding that it only controlled how likely it was going to swap, not whether or not it was going to swap.
Put it this way - even with 128 GB of RAM, SLES has been able to consume ALL of it in one form or another. I just wished that it was more biased towards user apps rather than OS/kernel vm caching.
Like I understand, at a very high level why this problem exists in the Linux kernel (because it doesn’t distinguish the difference between RAM that is used by user apps vs. RAM that is used due to caching - it just sees the RAM as being used), but it should have been architected more intelligently such that when there is an application that is requesting for more RAM than is currently available, one of the first thing the OS should be trying to do is clear out the cache - which it currently doesn’t seem to automatically do.
(It was also my interpretation that vfs_cache_pressure was supposed to do that, but with a value of ‘200’ set in /proc/sys/vm/vfs_cache_pressure and it is STILL caching like mad, something tells me that there is still more testing that needs to be done. Downside is that my suite of tests (via batch processing/shell script) takes about 4.5 days to run with each pass, so…I am reluctant to want to test the system settings slowly because I need to quickly load up the system and just as quickly - need the OS to start caching as well.)