I am facing an issue, which is NMI watchdog: BUG: soft lockup. The system hangs up and can not be reached via any terminal and ping command.
The issue happens in a virtual machine.
Host CPU information is below
# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 160 On-line CPU(s) list: 0-159 Thread(s) per core: 2 Core(s) per socket: 10 Socket(s): 8 NUMA node(s): 8 Vendor ID: GenuineIntel CPU family: 6 Model: 47 Stepping: 2 CPU MHz: 1064.000 BogoMIPS: 4800.28 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 30720K NUMA node0 CPU(s): 1-10,41-50 NUMA node1 CPU(s): 11-20,51-60 NUMA node2 CPU(s): 21-30,61-70 NUMA node3 CPU(s): 31-40,71-80 NUMA node4 CPU(s): 0,81-89,120-129 NUMA node5 CPU(s): 90-99,130-139 NUMA node6 CPU(s): 100-109,140-149 NUMA node7 CPU(s): 110-119,150-159
Host virtual machine information is below.
# virt-manager --version 0.9.4
Host OS information is below
# uname -r 3.0.101-0.47.79-default # cat /etc/SuSE-release SUSE Linux Enterprise Server 11 (x86_64) VERSION = 11 PATCHLEVEL = 3
Guest CPU information is below. In virt-manager, I selected “Copy host CPU configuration”.
> # lscpu
Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 42 bits physical, 48 bits virtual CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 4 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 2 Model name: QEMU Virtual CPU version 1.4.2 Stepping: 3 CPU MHz: 2400.084 BogoMIPS: 4800.16 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 4096K NUMA node0 CPU(s): 0-3 Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good nopl cpuid tsc_known_freq pni cx16 popcnt hypervisor lahf_lm pti
Guest OS information
# uname -r 4.12.14-197.37-default # lsb-release -a LSB Version: n/a Distributor ID: SUSE Description: SUSE Linux Enterprise Server 15 SP1 Release: 15.1 Codename: n/a
I checked below informatin in guest system.
# cat /proc/sys/kernel/tainted 0 # cat /proc/sys/kernel/watchdog 1 # cat /proc/sys/kernel/watchdog_thresh 10 # cat /proc/sys/kernel/nmi_watchdog 0 # cat /proc/sys/kernel/soft_watchdog 1 # cat /proc/sys/kernel/softlockup_panic 0 # cat /proc/sys/kernel/unknown_nmi_panic 0
And I did below update in guest system.
# echo 0 > /proc/sys/kernel/watchdog # echo 0 > /proc/sys/kernel/soft_watchdog # echo 20 > /proc/sys/kernel/watchdog_thresh
The issue is still there and the challenge is that host OS can not be upgraded to higher SLES version. Could you please advise if any solutions.