Kernel Updates Breaks OCFS2

Running SLES 11 SP2 with latest patches (as of 15/7/2013), SLES HAE SP2 OCFS2 Version 1.6_3.0.13, XEN
5 Node Cluster OCFS2 gets broken by the latest kernel update.

All other SLES nodes running kernel 3.0.58
Updated one server to kernel 3.0.80 and ocfs2 will no longer mount

ocfs2 Internal logic failure while trying to join the group Jul 15 12:48:44 sl-bne-hs23-01 kernel: [ 3496.524201] (mount.ocfs2,21615,11):o2hb_map_slot_data:1638 ERROR: status = -12 Jul 15 12:48:44 sl-bne-hs23-01 kernel: [ 3496.524208] (mount.ocfs2,21615,11):o2hb_region_dev_write:1768 ERROR: status = -12

By back rev’ing the kernel to 3.0.58, the system starts up again with no problems, ocfs2 mounts coming up correctly
Version 3.0.74 was not tested due to the need to get a live system back on line. No other patches had to be downgraded on the 5th server. Just the kernel-xen and the kernel-xen-base

This kernel version has been marked protected on our SLES cluster servers until a resolution is found.
Thanks
Eric

Hi Eric,

I just checked the patch announcement, which doesn’t mention “rolling updates”. I for one would therefore expect that rolling updates are supported, but have asked my contacts at SUSE if I’m reading that correctly. I’ll let you know once I’ve received a response.

Regards,
Jens

(message content deleted by author)

Hi Eric,

got my coffee, which made me notice:

  • you only mention “3.0.80”
  • there’s been 3.0.80-0.5, but a 3.0.80-0.7.1, too. The latter was a patch against problems with the 0.5 version

Are you in a position to verify 3.0.80-0.7.1? There’s no explicit mentioning of OCFS2 fixes (http://lists.opensuse.org/opensuse-security-announce/2013-07/msg00009.html), but it seems there were some important internal fixes included that may very well be worth testing.

Regards,
Jens

[QUOTE=jmozdzen;14482]Hi Eric,

got my coffee, which made me notice:

  • you only mention “3.0.80”
  • there’s been 3.0.80-0.5, but a 3.0.80-0.7.1, too. The latter was a patch against problems with the 0.5 version

Are you in a position to verify 3.0.80-0.7.1? There’s no explicit mentioning of OCFS2 fixes (http://lists.opensuse.org/opensuse-security-announce/2013-07/msg00009.html), but it seems there were some important internal fixes included that may very well be worth testing.

Regards,
Jens[/QUOTE]

Hi Jens,

The full details of the kernel version that breaks ocfs2 is
3.0.80-0.5.1-x86_64 from SLES11-SP2

3.0.80-0.7.1 is not on our patch server as yet so no, I’m not in a position to test it. It’s not part of SP3 is it?
Regards
Eric.

Hi Eric,

I just double-checked to confirm yesterday’s status: On my machines I see 3.0.80-7.1, we’re running against our own SMT server which receives the packets from Novell:

v | SLES11-SP2-Updates | kernel-default | 3.0.58-0.6.2.1 | 3.0.80-0.7.1 | x86_64 v | SLES11-SP2-Updates | kernel-default-base | 3.0.58-0.6.2.1 | 3.0.80-0.7.1 | x86_64 v | SLES11-SP2-Updates | kernel-firmware | 20110923-0.17.1 | 20110923-0.19.21.10 | noarch v | SLES11-SP2-Updates | kernel-xen | 3.0.58-0.6.2.1 | 3.0.80-0.7.1 | x86_64 v | SLES11-SP2-Updates | kernel-xen-base | 3.0.58-0.6.2.1 | 3.0.80-0.7.1 | x86_64

As you can see from above’s “zypper lu” output, it’s from the SLES11-SP2 repository, so no, I didn’t confuse with SP3 - enough coffee this time :wink:
Looking at the repository directory on our SMT server, I see that this update is two weeks old:

2190846669 20540 -rw-r--r--   1 smt      www      21031921 Jun 27 17:10 ./sle-11-x86_64/rpm/x86_64/kernel-xen-3.0.80-0.7.1.x86_64.rpm

So you might want to check your update sources.

Regards,
Jens

Okay, my SMT server says it’s working but I suspect nobody is home. I’ve kiled the SMT service and started it again. A whole bunch of updates are starting to show up. It will be a while before I can get back to this and these are production machines. As soon as the opportunity avails itself, I’ll try the later kernel once I have it.
Thanks
Eric.

Hi Eric,

if the problem persists after applying the latest updates, please let me know (I’m monitoring this thread) so I can try to get some feedback from SUSE.

Regards,
Jens

[QUOTE=jmozdzen;14565]Hi Eric,

if the problem persists after applying the latest updates, please let me know (I’m monitoring this thread) so I can try to get some feedback from SUSE.

Regards,
Jens[/QUOTE]

Hi Jens,

After resolving my SMT issue (For some reason a file was missing write rights) I now have the latest kernel patches. However, as these machines are production and other hardware issues have been resolved, they are now very stable. It’s unlikely that I’m going to get the opportunity to test the later kernel. We are also preparing to move to SP3 which will mean that our whole virtual infrastructure will have to be updated at the same time. This is likely to be the next time the machines are taken down.

I can confirm from our test environment for SP3 that there is no problems with OCFS under SLES SP3 and HAE SP3.
If an opportunity arises to test the later SP2 kernel, I’ll add an additional post. Otherwise, I’ll have to leave this issue as resolved.
Thanks
Eric.

Hi Eric,

thanks for reporting back that status. I’ll let my back-end supports off the hook then :wink:

If an opportunity arises to test the later SP2 kernel, I’ll add an additional post. Otherwise, I’ll have to leave this issue as resolved.

If you find any problem, just let us know, we’ll do our best to get things rolling again.

Regards,
Jens