Hello everyone,
We have 5 HP BL460 Gen9 servers working as XEN hypervisors under SLES 12 SP1, every server hosts 4-5 fullvirt SLES 12 SP1 guests.
About 1 time in 2 months random DomU guest gets unresponsive, in xl list i see system state just as “------”, i can ping the guest, but ssh/vnc is not responding. The only option to bring system back is to power off and restart it from virt-manager.
In system logs at that time i can see qemu-system-i386 segfaults:
[16312686.295207] IPv6: udp checksum is 0
[18570901.441606] qemu-system-i38[3619]: segfault at 0 ip 00007fcc3b3e1fae sp 00007ffeed8f5068 error 4 in libc-2.19.so[7fcc3b352000+19e000]
[18570901.527129] br0: port 3(vif1.0-emu) entered disabled state
This happens on all XEN hypervisors.
xl info:
xl info
host : MSK-HVX05
release : 3.12.49-11-xen
version : #1 SMP Wed Nov 11 20:52:43 UTC 2015 (8d714a0)
machine : x86_64
nr_cpus : 40
max_cpu_id : 39
nr_nodes : 2
cores_per_socket : 10
threads_per_core : 2
cpu_mhz : 2297
hw_caps : bfebfbff:2c100800:00000000:00007f00:77fefbff:00000000:00000021:000037ab
virt_caps : hvm hvm_directio
total_memory : 262015
free_memory : 159893
sharing_freed_memory : 0
sharing_used_memory : 0
outstanding_claims : 0
free_cpus : 0
xen_major : 4
xen_minor : 5
xen_extra : .1_12-2
xen_version : 4.5.1_12-2
xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler : credit
xen_pagesize : 4096
platform_params : virt_start=0xffff800000000000
xen_changeset :
xen_commandline : dom0_mem=5785M,max:5785M
cc_compiler : gcc (SUSE Linux) 4.8.5
cc_compile_by : abuild
cc_compile_domain : suse.de
cc_compile_date : Thu Nov 5 14:42:08 UTC 2015
xend_config_format : 4
I am trying to catch core dump, but i don’t know, what do i need to get:
- Core dump of domU kernel
- Core dump of crashed qemu-system-i386 process on hypervisor
Please, give me some advice, what do i need to catch.