SLES11 SP2 to SP3 update: system now unstable

I updated a server running SLES11 from SP2 to SP3, and in the 24 hours since I did this the server has rebooted twice for no apparent reason. Running SP2, the uptime was at least one year. Does anyone have any suggestions as to how to find out the reason for this instability? I have of course looked in the logs but I cannot find anything that explains this behaviour. In both cases, /var/log/messages contained nothing for at least 5 minutes before times of the reboots.

The update procedure was routine: I applied all pending patches for SP2, and then used wagon to apply SP3. No errors or difficulties were encountered during this process.

The server is hosting several virtual machines (qemu-kvm), and the behaviour of libvirtd is the only other odd thing that I have noticed. After starting or restarting the libvirtd service, “virsh list” appears to hang, although after several minutes it does start responding as normal. Also virt-manager now takes several minutes after opening a connection before the guests are displayed. I do not know if this is connected with the server reboots.

The other thing that I can think of that may be relevant is that for historical reasons we have disabled IPv6 on this system, although if this is likely to be a source of problems we could reconsider this.

I would be very grateful for any hints or suggestions.
Regards
Peter.

Hi Peter,

first of all, those multi-minute delays are definitely worth being looked at in detail - but this is probably a different story.

I have of course looked in the logs but I cannot find anything that explains this behaviour. In both cases, /var/log/messages contained nothing for at least 5 minutes before times of the reboots.

Two additional places to check are

  • system event log (the motherboard’s BMC log of hardware-related events.- see “ipmitool sel” command)

  • your KVM logs, to see if anything special occurred right before the reboot, that didn’t make it to the syslog.

I take it that you not only migrated to SP3, but installed current updates, too? I don’t recall if “wagon” does that for you, so you might want to add a call to “zypper up” for good measure…

Regards,
Jens

Hi Jens,

[QUOTE=jmozdzen;27758]Hi Peter,

first of all, those multi-minute delays are definitely worth being looked at in detail - but this is probably a different story.

I have of course looked in the logs but I cannot find anything that explains this behaviour. In both cases, /var/log/messages contained nothing for at least 5 minutes before times of the reboots.

Two additional places to check are

  • system event log (the motherboard’s BMC log of hardware-related events.- see “ipmitool sel” command)[/QUOTE]

I looked both with ipmitool sel and via the ipmi http interface (it is a SuperMicro motherboard and BMC) : no events since 2013

No files at all under /var/log/kvm, if that is where you meant. The individual guest logs under /var/log/libvirt/qemu only show the guests starting up again after the server stopped.

I double-checked: it seems that there may have been a problem with refreshing the SP3 repos. However after a refresh the only updates pending were to zypper and PackageKit related packages. I have applied them FWIW, but I doubt that these were responsible.

So far no more crashes for 9 hours: I’ll see how it holds up overnight.

If I can’t solve this quickly, I would be inclined to schedule some downtime and re-install the OS from DVD rather than try too hard to rescue a borked upgrade.

Thanks,
Peter.

Hi Peter,

[QUOTE=pakeller;27762]Hi Jens,
[…]
I double-checked: it seems that there may have been a problem with refreshing the SP3 repos. However after a refresh the only updates pending were to zypper and PackageKit related packages.[/QUOTE]

After updating those, please be sure to re-run the updater - those patches update the updating infrastructure, hence further updates only catch on successive runs.

I fully agree on that… good that you have this option.

Regards,
Jens