Running SLES 12 SP5. Sometimes when running a simple commands there is a pause before it runs this only seem to happen the first time you ran the command?
Below is an example see the real time on the first time a ran it took over 4 Second the second time it took 3 milliseconds?
This happens randomly but usually only on the first time the command is ran
Example:
Server2:/var/log # time free
total used free shared buffers cached
Mem: 41189156 27752424 13436732 26680 264612 26356572
-/+ buffers/cache: 1131240 40057916
Swap: 6225912 66816 6159096
real 0m4.298s
user 0m0.002s
sys 0m0.001s
server2:/var/log # time free
total used free shared buffers cached
Mem: 41189156 27752424 13436732 26680 264616 26356568
-/+ buffers/cache: 1131240 40057916
Swap: 6225912 66816 6159096
I’m not sure how free calculates the Swap space, but if it needs to read the swap file or partition on the HDD, then this would explain your issue.
There are really many cases, where the system seems slow, because hard drives are sleeping and need to be woken up, first. So, this would be my first guess, as it’s so common.
The other effect here is the disk cache, which for Linux takes common or recently used disk blocks and caches them in unused RAM, so if you’re running commands in cache it’s enormously faster. Coincidentally enough, the man page of the free command is a decent place to get a quick description.
Thank you for these replies, this seems to happen randomly with multiple commands like lvs, pvs, df etc…
This server is a VM on VMware vSphere so it uses vmdk’s as disks on a SAN.
From your comments I thinking setting up a new datastore area and migrate this server storage
just in case there is a problem.
Does, again, not sound random. These are all commands related to storage. So, it’s not random. It’s related to your storage in some way.
Not sure if this is on-premise or in a proprietary cloud. Fact remains: if the disk is somewhere in your space, where there is no ultra speedy RAID setup like what cloud providers use, then your disk is simply too slow, from your perspective.
It’s a common issue and nothing to worry about, except these couple of seconds for the first time are really a deal breaker for you.
As hinted above, it depends on where it is now. If it is currently in a proprietary cloud, it should be already optimised as much as possible, if they know what they are doing. So, moving it wouldn’t fix it.
However, if you manage it yourself and don’t have a super speedy setup, this is most likely the underlying issue.
Generally speaking most people just live with this amount of delay. If it’s a physical box maybe tweak the power saving settings a bit if hardware’s truly going to sleep, but otherwise just live with it. If you increase the amount of RAM on your VMs your disk cache size will be larger so it’ll take longer for things to get flushed out of it (though data access will still kick out your binaries, so not helpful if you do much data wrangling). Moving to solid state could help since you don’t have any literal spin up time (though with a SAN you’d need to make sure it supports and how and it’d be pricey), but if your SAN isn’t high enough performance or is oversubscribed then you won’t really be able to do much.
If 5 sec of latency is an issue for you, your cheaper answer might be to get some physical boxes with SSDs and interact mainly with those?
Sorry I probably don’t make things clear, the server (VM) is in our own Data Centre on our own Local SAN, and this server only started having the issue about 2 Months ago, it was fine before that. It’s random in the sense that it does not happen all the time, I can go onto the server and everything is responding quickly, then later I run a simple command and it take 4 second you rerun the same command straight away and it responds quickly within a second. This server has 8 vCPU and 40G of RAM, annoyingly I cannot see any obvious issues with CPU, RAM, Network or Disk I/O? But I agree with some of the comments it does look like a Disk/Disk Cache issue, the vSphere environment is not reporting any errors. I’ve been investigating this for a couple of weeks, so I’m open to all suggestions.
Disk speed, disk cache, and if the other VMs are maybe having periodic busy cycles bogging things down (I/O will be what’s bogging down, and monitoring can have trouble with that).
The other thing you have with vSphere is that it’ll share identical memory pages between VMs and those VM memory pages can get paged out if you’re too overcommitted too.
Another thought might be, that some update of something (whatever it might be) changed power saving behaviour. Maybe, it became more aggressive and it wasn’t noticed. Servers usually wouldn’t use such a feature, but if it’s a self set up Data Centre, then who knows what might’ve been missed?