Hello
I need some help : sometimes, a VM installed with SLES12 SP3 (LTSS) gets stuck after reboot. For around 20 minutes, no reaction ! After that, kernel loads and system starts.
I don’t have access to the vmware infrastructure and I need to prove that the issue could come from there… :neutral:
I look at boot.log, messages and dmesg but nothing seems wrong except the delay.
For example, here is a messages content:
2021-01-05T07:00:01.584829+01:00 suse1 CRON[27153]: (root) CMD (/sbin/shutdown -r 2>&1 >/dev/null)
(…)
2021-01-05T07:01:01.597972+01:00 suse1 systemd[1]: network.target: Found ordering cycle on network.target/stop
2021-01-05T07:01:01.598592+01:00 suse1 systemd[1]: network.target: Found dependency on unmountnfs.service/stop
2021-01-05T07:01:01.602189+01:00 suse1 systemd[1]: network.target: Found dependency on sysinit.target/stop
2021-01-05T07:01:01.621744+01:00 suse1 systemd[1]: network.target: Found dependency on systemd-tmpfiles-setup.service/stop
2021-01-05T07:01:01.621844+01:00 suse1 systemd[1]: network.target: Found dependency on local-fs.target/stop
2021-01-05T07:01:01.621937+01:00 suse1 systemd[1]: network.target: Found dependency on var-backup.mount/stop
2021-01-05T07:01:01.622090+01:00 suse1 systemd[1]: network.target: Found dependency on network.target/stop
2021-01-05T07:01:01.622192+01:00 suse1 systemd[1]: network.target: Breaking ordering cycle by deleting job unmountnfs.service/stop
2021-01-05T07:01:01.622286+01:00 suse1 systemd[1]: unmountnfs.service: Job unmountnfs.service/stop deleted to break ordering cycle starting with network.target/stop
2021-01-05T07:01:01.622462+01:00 suse1 systemd[1]: wickedd.service: Found ordering cycle on wickedd.service/stop
2021-01-05T07:01:01.622557+01:00 suse1 systemd[1]: wickedd.service: Found dependency on local-fs.target/stop
2021-01-05T07:01:01.622643+01:00 suse1 su: pam_unix(su-l:session): session closed for user htuser
2021-01-05T07:31:28.344203+01:00 suse1 dmeventd[616]: dmeventd ready for processing.
2021-01-05T07:31:28.344231+01:00 suse1 kernel: [ 0.000000] Initializing cgroup subsys cpuset
2021-01-05T07:31:28.344566+01:00 suse1 kernel: [ 0.000000] Initializing cgroup subsys cpu
2021-01-05T07:31:28.344567+01:00 suse1 kernel: [ 0.000000] Initializing cgroup subsys cpuacct
2021-01-05T07:31:28.344568+01:00 suse1 kernel: [ 0.000000] Linux version 4.4.180-94.113-default (geeko@buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Fri Dec 13 14:20:57 UTC 2019 (c6649f6)
2021-01-05T07:31:28.344568+01:00 suse1 kernel: [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.4.180-94.113-default root=/dev/mapper/vg_root-lv_root resume=/dev/sda2 splash=silent quiet showopts biosdevname=0 splash=verbose consoleblank=0 nomodeset
and machine starts well.
This machine is restarted everyday (to prevent a memory leak from a custom script) and it got stuck twice in the last 3 weeks.
If there was an issue during the stop, I don’t see any errors.
I suppose an issue with the vmware infrastructure as we don’t have any warning about a lack of resources (cpu, disk, memory), but how could I prove it ?
If this was not an issue with the infrastructure, where could I find some traces before the kernel load ?