NFSv4 problems with kernel-default-3.0.101-0.35.1

SLED_11_SP3

Defect: After applying latest patches to SLES_11_SP3 and SLED_11_SP3, desktop machines (SLED) with user homedirs mounted from NFS server (SLES) are unable to run GNOME applications (including firefox)

Applications fail with error
GConf Error: Failed to contact configuration server; some possible causes are
that you need to enable TCP/IP networking for ORBit, or you have stale NFS
locks due to a system crash. See http://projects.gnome.org/gconf/ for
information.

Desktop system eventually freezes completely with symptoms similar to stale NFS filehandles (processes enter uninterruptible sleep, directories become inaccessible)
Other Software and Versions Being Used: MozillaFirefox-24.6.0esr-0.8.1

All software is installed and upgraded from standard SUSE repositories via ‘zypper dup’
Steps to Duplicate: From KDE desktop: Run any GNOME application which uses GConf2

GNOME desktop: won’t start.
Environment Description: 300+ SLED_11_SP3 desktop machines mounting ~2500 user home directories from SLES_11_SP3 servers

All machines updated with latest patches (specifically, kernel-default-3.0.101-0.35.1, glibc-2.11.3-17.66.1)
Problem Environment: Production
Testing Environment: Yes
Fix: Downgrade kernel to 3.0.101-0.31.1, glibc to 2.11.3-17.62.1 on DESKTOPS only. NFS files servers continue to run latest kernel-default-3.0.101-0.35.1

The glibc downgrade may be unnecessary, but I’ve got it working again and I don’t have another maintenance window for a while.

I don’t know what you’re expecting to happen as a result of your post, but as it is written like a bug report I wonder if you expect more than will result. Forums are not an official support channel. If you want support from SUSE you need to open a Service Request which you can do via https://www.suse.com/support/ Alternatively bugs can be reported via https://www.suse.com/support/report-a-bug/

I’m using NFS home directories and not seeing a problem. It’s NFSv3 from Solaris though rather than NFSv4 from SLES. I do have an open Service Request about an NFSv4 issue with the latest kernel version where ‘\040(deleted)’ appears in the /proc/mounts entry for a volume mounted with autofs. E.g.

foo.here://apps /mnt/auto/apps\\040(deleted) nfs4 ro,nosuid,relatime,vers=4,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=xxx.xxx.xxx.xxx,minorversion=0,local_lock=none,addr=yyy.yyy.yyy.yyy 0 0
It is seemingly impossible to unmount the volume due to the presence of ‘\040(deleted)’ in /proc/mounts. I did once see an issue with running applications that reside on the volume but couldn’t recreate it on other machines and haven’t seen it since. The issue does not occur with kernel 3.0.101-0.31.1. I didn’t downgrade glibc.

Yeah, I’ve reported the bug at https://www.suse.com/support/report-a-bug/ and am trying to establish from my organisation if we have a support contract allowing me to raise a service request. In the meantime I just wanted to log this somewhere that I could refer back to.

reverting glibc to current latest 2.11.3-17.66.1 makes no difference. – glibc is red herring.

For the benefit of anyone following this thread:

The issue involved is specific to NFS4, and came about with kernel 3.0.101-0.35.1. It was reported/handled with novell/suse bugzilla report 892200. The fix is tentatively scheduled to be included in a future maintenance release of the SP3 kernel. Anyone needing a temporary fix on SLES* 11 SP3 can obtain a PTF kernel at:

https://ptf.suse.com/f2cf38b50ed714a8409693060195b235/sles11-sp3/7260/x86_64/20140918/

Alternatively, you can back-rev to kernel 3.0.101-0.31.1 which does not have this issue.

The PTF is no longer needed, as the fixes have been publically released in kernel 3.0.101-0.40.1.

FYI, the fix described above is for NFS4 and the problem was introduced in 3.0.101-0.35 However, ALSO introduced in -0.35 was another NFS bug that can effect any NFS mount, regardless of version, and it is fixed in the PTF and kernel 0.40.1 as well. So… regardless of the NFS version you use, if your system is an NFS client, you don’t want to be on the “stock” kernel 3.0.101-0.35.