Semi random booting/stability issues

Hello,

Over the last month or so we’ve had three virtual boxes running vmware lock up for no reason we can determine. When we power them off and then on again, they will partially
boot and then hang. I have attached a screen shot of where they hang after the kernel is selected.

Odd thing is if we select a previous kernel, instead of the newest kernel, they would all boot fine (except for the one we are currently having issues with). After booting them we can then boot back to the newest kernel and everything seems fine again, no issues. Or so it was for the first two boxes which did this. The third box and last box, refuses to stay up after we get it back up with the newest kernel again.

Not sure if I should open a support ticket since I’m sorta floundering a bit trying to figure out what could be causing this. The only useful info we’ve noticed so far is the latest box
complained about “BTRFS error: could not find root” when we initially brought it up under the older kernel.

Any pointers or suggestions would be appreciated.

Thanks,
Matt

Hi
I’ve asked my SUSE contacts if it’s a known issue.


Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
openSUSE Leap 42.2|GNOME 3.20.2|4.4.87-18.29-default
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below… Thanks!

Oldest kernel on the box ( 3.12.62-60.62-default ) seems stable. The following two seem to have issues? 4.4.74-92.35 4.4.74-92.38 at least on this box…

Hi
Can you provide the output of the boot log?

journalctl -b --no-pager

And post up to say http://paste.opensuse.org/ (please sanitize if required) and set to never expire.

Looking at the output of that it seems to only contain the information for the latest boot with the old kernel (can’t do anything useful before it looks up when we an get it up with the newst kernel). Is that useful since it is the newer kernel with the hanging issue?

Matt

[QUOTE=skunkboy;39728]Looking at the output of that it seems to only contain the information for the latest boot with the old kernel (can’t do anything useful before it looks up when we an get it up with the newst kernel). Is that useful since it is the newer kernel with the hanging issue?

Matt[/QUOTE]
Hi
You can look at past boots…

journalctl --help
journalctl --list-boots

Coredumps?

coredumpctl list

I will poke around with the boot logs.

There are no coredumps.

I’ve poked about on a couple of our sles12sp2 boxes and they all only have info on the current boot?

journalctl --list-boots

0 c0e0018c8b77416ebf4c41cddc2c5e55 Tue 2017-09-26 00:31:01 EDT—Fri 2017-09-29 1

Hi
So it won’t even boot to the newer kernels then before it locks up?

What storage is being used for the VM’s?

It will boot up with the newer kernel, and you have just about enough time to login before it locks up with a crazy high system load.

We have a disk array that I would have to check to verify what it is.

Figured out there are no boot logs. Not sure if SLES12 or any of the patch levels turn this on by default. I had to do the following::

systemctl restart systemd-journald
mkdir /var/log/journal
systemd-tmpfiles --create --prefix /var/log/journal
systemctl restart systemd-journald

Next reboot I had the previous information boot information in journalctl.

I’m now in the process of getting a ticket open with support for this stability issue.

Thanks,
Matt

skunkboy wrote:
[color=blue]

Any pointers or suggestions would be appreciated.[/color]

Have you verified that the version of SLES12 SP(x) is supported under
the version of VMware you are using?

I don’t know if this applies in your case… but if you are using ESXi
VMware recommends that VMs not be upgraded to new releases of the OS
but instead you do a new install.


Kevin Boyle - Knowledge Partner
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below this post.
Thank you.

Yes our OS version is supported by our the version of ESXi that we are running.

We have been doing clean installs when going from SLES 11 to 12, not when going from 12sp1 to 12sp2, that seems insane, would have to reinstall the boxes every other year?

Thanks,
Matt

skunkboy wrote:
[color=blue]

Yes our OS version is supported by our the version of ESXi that we are
running.

We have been doing clean installs when going from SLES 11 to 12, not
when going from 12sp1 to 12sp2, that seems insane, would have to
reinstall the boxes every other year?

Thanks,
Matt[/color]

Great! That’s one potential issue eliminated.

When issues are posted to these forums, we never know what the
background of the sysadmin might be and sometimes have to ask rather
obvious questions. :slight_smile:

Personally, I don’t like the idea of having to do new installs for
major version upgrades but it is what it is.

For others who might be reading this thread, this is the relevant
VMware Knowledge Base (KB) article:
VMware support for guest operating system upgrade (2018695)
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2018695


Kevin Boyle - Knowledge Partner
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below this post.
Thank you.

We have not seen this again since I turned on boot log archiving on the boxes in question. Not sure if this was an actual fix, or we just had some other odd issue. But since it has stopped I’m not currently too worried about it…

Matt