Random hangs/freeze after latest updates. How to back out?

We have a number of identical desktops running SLED 11 SP2. They have been reliable. However, on Monday night we did a zypper update (from an SMS server that has a test and production environment) and the packages below were updated on production. Since that update we have had 5 or 6 instances of production desktops freezing. The mouse and keyboard stop responding. The computers do not respond to ping. The only solution is to turn off and reboot. /var/log/messages does not give any clues.

Are any of these packages likely culprits? (I expected a kernel update or something…)
What is the best way to back out updates?

The following packages are going to be upgraded:
bridge-utils flash-player google-chrome-stable kpartx libblkid1 libgudev-1_0-0
libpcap0 libreoffice libreoffice-base libreoffice-base-extensions libreoffice-b
randing-SLED libreoffice-calc libreoffice-calc-extensions libreoffice-draw libre
office-draw-extensions libreoffice-filters-optional libreoffice-help-en-US libre
office-icon-themes libreoffice-impress libreoffice-impress-extensions libreoffic
e-kde4 libreoffice-mailmerge libreoffice-math libreoffice-pyuno libreoffice-writ
er libreoffice-writer-extensions libudev0 libuuid-devel libuuid1 multipath-tools
popt rpm udev util-linux util-linux-lang uuid-runtime yast2 yast2-nfs-client

I would like to know what kind of hardware models you are using in production, and whether this is SLED OEM version.

The only way to back out such a change, as far as I’m aware, would be to remove the packages in question and then install the previous versions of them by specifying the version number. E.g.

$ zypper in MozillaFirefox-translations-10.0.2-0.4.1

As for working out what the previous versions are, well you could look on any machines that haven’t been updated if you have such a thing, or I guess you just have to figure it out for each package by looking at the contents of the repos.

Given the nature of the problem I advise you raise a Service Request. You can do that via https://secure-www.novell.com/center/eservice/

Thanks for your response.

I’m not sure what you mean by OEM version. The install images were downloaded from Novell. We have a basic license that provides updates but no tech support.

Hardware specs
Lenovo ThinkCentre M91p (4480-C2U)
Intel Core I7-2600 (Sandy Bridge)
Q67 Express Chipset
4GB Ram

Thanks, I can see how to do that. Any thoughts about which packages are most likely to cause such a problem?

I probably can’t open a service request because we only have a basic license with no tech support.

I’m afraid I’ve idea which of them could cause the problem you describe, but if I had to go through those one by one trying to work out which one causes the problem I’d leave the libreoffice packages until last. Unless the freezes occur when you’re running libreoffice. Are you sure those are the only packages that got updated just before the problem started? You can check by looking in /var/log/zypper/history

Have you been able to discern any common set of circumstances in when the freezes occur?

Are all the machines that exhibit the problem the same hardware spec? (If so, are all machines of that spec affected?)

It might, or well may not be worth, having the machines that have the problem send a copy of their log to somewhere else. Maybe something useful would show up in the log on the syslog server that doesn’t get written to disk on the local machine. I.e. something gets sent to the log but before before the file change gets committed the system hangs, but the packets do make it out. But I’ve really no idea if that would yield anything of value and it could be a hassle to set up if you’re not familiar with it. This forum thread here looks like it has what you need to get that going. (It’s the sort of thing I set up once every few years and forget all about.)

There are about 19 identical machines, about 7 have frozen. About three have frozen more than once.

These are user desktops. I haven’t been able to find a common circumstance. Several freezes happened with an Open GL screen saver running, but others have happened before the screen saver kicks in. The user turns away from the computer and when he comes back, it is frozen. Several have happened overnight (most people shut down at night, but not all).

I set up a system to be able to take core dumps, then tried stressing it for a short period and watched the CPU temps climb, but the machine did not freeze. I stopped the stress test and left it alone for awhile and found it frozen on a screen saver. ALT SYSREQ C would not initiate a core dump.

We don’t have any external logging set up. My suspicion is that it is not a loggable event. It seems pretty low level.

On the ones where you say it happens before the screen saver kicks in, how long before? Are you sure the screen saver hasn’t activated and maybe it just doesn’t look like it has, like maybe it didn’t get chance to draw anything on the screen for whatever reason?

Does the problem still occur if you change the screensaver to just a blank screen?

I had a problem some years ago where some machines were freezing up when the default screensaver, which used Open GL, kicked in. The problem was related to the machines having Intel graphics but also having the Nvidia graphics driver obtained from nvidia.com installed. (My doing.) They were fine for ages and then one day after updates had been applied, they started to freeze when the screensaver kicked in. Once I removed the Nvidia driver, everything was fine. I can’t find a record or recall which update it was I identified as having triggered the problem, I have a feeling it might have been something with Mesa in the name. But the actual problem was the presence of the Nvidia driver.

I guess I can’t tell for sure, but there is still a mouse cursor visible.

I tried with the non opengl “lines” screen saver and still had freezes. I don’t think I’ve tried with blank, but will do so.

Currently I am testing with the power save options for the display disabled and have not had a freeze on those test machines yet.

Thanks for the idea about the nvidia drivers. These are intel graphics also. I checked and made sure that nvidia and ATI are not installed.

Well, one of my machines with power save turned off froze anyway. I looked back to the prior set of updates and saw there was a kernel update and a video update so maybe one of them is the cause.

kernel-syms-3.0.51- Mon 14 Jan 2013 03:35:16 PM MST
kernel-xen-devel-3.0.51- Mon 14 Jan 2013 03:35:14 PM MST
kernel-pae-extra-3.0.51- Mon 14 Jan 2013 03:35:04 PM MST
kernel-trace-devel-3.0.51- Mon 14 Jan 2013 03:35:00 PM MST
kernel-pae-3.0.51- Mon 14 Jan 2013 03:34:46 PM MST
kernel-pae-devel-3.0.51- Mon 14 Jan 2013 03:34:39 PM MST
kernel-pae-base-3.0.51- Mon 14 Jan 2013 03:34:28 PM MST
xorg-x11-driver-video- Mon 14 Jan 2013 03:33:27 PM MST
samba-client-3.6.3-0.28.1 Mon 14 Jan 2013 03:34:58 PM MST
avahi-0.6.23-11.23.1 Mon 14 Jan 2013 03:34:56 PM MST
yast2-storage- Mon 14 Jan 2013 03:34:43 PM MST
libldb1-3.6.3-0.28.1 Mon 14 Jan 2013 03:34:38 PM MST
libdns_sd-0.6.23-11.23.1 Mon 14 Jan 2013 03:34:37 PM MST
avahi-lang-0.6.23-11.23.1 Mon 14 Jan 2013 03:34:37 PM MST
perl-doc-5.10.0-64.59.1 Mon 14 Jan 2013 03:34:26 PM MST
yast2-storage-lib- Mon 14 Jan 2013 03:34:25 PM MST
sysstat-8.1.5- Mon 14 Jan 2013 03:34:24 PM MST
cron-4.1-194.205.1 Mon 14 Jan 2013 03:34:24 PM MST
kernel-default-devel-3.0.51- Mon 14 Jan 2013 03:34:14 PM MST
libproxy0-networkmanager-0.3.1-2.6.3 Mon 14 Jan 2013 03:34:12 PM MST
libproxy0-config-kde4-0.3.1-2.6.3 Mon 14 Jan 2013 03:34:12 PM MST
libproxy0-config-gnome-0.3.1-2.6.3 Mon 14 Jan 2013 03:34:12 PM MST
libtevent0-3.6.3-0.28.1 Mon 14 Jan 2013 03:34:11 PM MST
libsmbclient0-3.6.3-0.28.1 Mon 14 Jan 2013 03:34:11 PM MST
libavahi-core5-0.6.23-11.23.1 Mon 14 Jan 2013 03:34:10 PM MST
libavahi-client3-0.6.23-11.23.1 Mon 14 Jan 2013 03:34:10 PM MST
glibc-devel-2.11.3-17.43.1 Mon 14 Jan 2013 03:34:10 PM MST
strace-4.5.18-10.22.1 Mon 14 Jan 2013 03:34:08 PM MST
perl-Bootloader- Mon 14 Jan 2013 03:34:08 PM MST
perl-5.10.0-64.59.1 Mon 14 Jan 2013 03:34:08 PM MST
file-4.24-43.23.1 Mon 14 Jan 2013 03:34:05 PM MST
glibc-locale-2.11.3-17.43.1 Mon 14 Jan 2013 03:34:04 PM MST
nscd-2.11.3-17.43.1 Mon 14 Jan 2013 03:34:00 PM MST
grep-2.7-5.7.1 Mon 14 Jan 2013 03:34:00 PM MST
libzypp-9.12.11-0.5.1 Mon 14 Jan 2013 03:33:59 PM MST
kernel-source-3.0.51- Mon 14 Jan 2013 03:33:41 PM MST
libtdb1-3.6.3-0.28.1 Mon 14 Jan 2013 03:33:26 PM MST
libtalloc2-3.6.3-0.28.1 Mon 14 Jan 2013 03:33:26 PM MST
libproxy0-0.3.1-2.6.1 Mon 14 Jan 2013 03:33:26 PM MST
libwbclient0-3.6.3-0.28.1 Mon 14 Jan 2013 03:33:25 PM MST
libavahi-common3-0.6.23-11.23.1 Mon 14 Jan 2013 03:33:25 PM MST
yast2-dbus-server-2.17.5-0.5.1 Mon 14 Jan 2013 03:33:24 PM MST
libgnutls26-2.4.1- Mon 14 Jan 2013 03:33:23 PM MST
perl-base-5.10.0-64.59.1 Mon 14 Jan 2013 03:33:21 PM MST
glibc-2.11.3-17.43.1 Mon 14 Jan 2013 03:33:17 PM MST
glibc-i18ndata-2.11.3-17.43.1 Mon 14 Jan 2013 03:33:14 PM MST

After several days of testing, it appears that backing out the 3.0.51 kernel and returning to the 3.0.42 kernel has stopped the freezes.