SLES 11 SP4 with samba 4.4.3 / kernel bug triggered by samba

Dear SUSE,

on my server with SLES 11 SP4 with kernel 3.0.101-71-default I tried to compile and use
the latest samba software with version 4.4.3. Compiling was not a problem, but using
the software leads to an unstable system, resulting in kernel messages like those:

May 9 15:27:18 geo kernel: [81924.433944] BUG: soft lockup - CPU#0 stuck for 23s! [smbd-notifyd:6073]
May 9 15:27:18 geo kernel: [81924.433948] Modules linked in: ip6t_LOG xt_tcpudp xt_pkttype ipt_LOG xt_limit binfmt_misc edd mpt3sas mpt2sas raid_class mptctl mptbase cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables fuse xfs loop dm_mod sr_mod cdrom ipv6 ipv6_lib igb dca ipmi_si ses iTCO_wdt i2c_i801 mei ptp usb_storage ipmi_msghandler joydev pcspkr enclosure iTCO_vendor_support sg pps_core rtc_cmos button container shpchp pci_hotplug ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect i2c_core syscopyarea isci(X) libsas processor ehci_hcd scsi_transport_sas thermal_sys sd_mod crc_t10dif hwmon usbcore usb_common scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_d
May 9 15:27:18 geo kernel: h ahci libahci libata megaraid_sas scsi_mod
May 9 15:27:18 geo kernel: [81924.434048] Supported: Yes, External
May 9 15:27:18 geo kernel: [81924.434050] CPU 0
May 9 15:27:18 geo kernel: [81924.434052] Modules linked in: ip6t_LOG xt_tcpudp xt_pkttype ipt_LOG xt_limit binfmt_misc edd mpt3sas mpt2sas raid_class mptctl mptbase cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables fuse xfs loop dm_mod sr_mod cdrom ipv6 ipv6_lib igb dca ipmi_si ses iTCO_wdt i2c_i801 mei ptp usb_storage ipmi_msghandler joydev pcspkr enclosure iTCO_vendor_support sg pps_core rtc_cmos button container shpchp pci_hotplug ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect i2c_core syscopyarea isci(X) libsas processor ehci_hcd scsi_transport_sas thermal_sys sd_mod crc_t10dif hwmon usbcore usb_common scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_d
May 9 15:27:18 geo kernel: h ahci libahci libata megaraid_sas scsi_mod
May 9 15:27:18 geo kernel: [81924.434131] Supported: Yes, External
May 9 15:27:18 geo kernel: [81924.434134]
May 9 15:27:18 geo kernel: [81924.434137] Pid: 6073, comm: smbd-notifyd Tainted: G X 3.0.101-71-default #1 Supermicro X9DR3-F/X9DR3-F

I saw some discussions concerning this problem where most people say this is a kernel
bug which can be triggered by samba, and Volker Lendecke from SerNet recommends downgrading
Samba to 4.2 and waiting for the next kernel where the issue is fixed, see:

http://samba.2283325.n4.nabble.com/AW-Centos-6-kernel-soft-lockup-CPU-20-stuck-for-67s-smbd-notifyd-after-upgrade-form-4-2-to-4-4-td4702013.html
http://comments.gmane.org/gmane.linux.suse.opensuse.evergreen/153
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1543980

My questions is, if kernel 3.0.101-71-default in SLES 11 SP4 also suffers from this bug
and if so, when it will be fixed.

Thanks for any answers,

Klaus

On 10/05/16 09:04, kbraun wrote:
[color=blue]

on my server with SLES 11 SP4 with kernel 3.0.101-71-default I tried to
compile and use
the latest samba software with version 4.4.3. Compiling was not a
problem, but using
the software leads to an unstable system, resulting in kernel messages
like those:

May 9 15:27:18 geo kernel: [81924.433944] BUG: soft lockup - CPU#0
stuck for 23s! [smbd-notifyd:6073]
May 9 15:27:18 geo kernel: [81924.433948] Modules linked in: ip6t_LOG
xt_tcpudp xt_pkttype ipt_LOG xt_limit binfmt_misc edd mpt3sas mpt2sas
raid_class mptctl mptbase cpufreq_conservative cpufreq_userspace
cpufreq_powersave acpi_cpufreq mperf microcode ip6t_REJECT
nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT
iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns
nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables
xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables fuse xfs
loop dm_mod sr_mod cdrom ipv6 ipv6_lib igb dca ipmi_si ses iTCO_wdt
i2c_i801 mei ptp usb_storage ipmi_msghandler joydev pcspkr enclosure
iTCO_vendor_support sg pps_core rtc_cmos button container shpchp
pci_hotplug ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm
i2c_algo_bit sysimgblt sysfillrect i2c_core syscopyarea isci(X) libsas
processor ehci_hcd scsi_transport_sas thermal_sys sd_mod crc_t10dif
hwmon usbcore usb_common scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc
scsi_dh_alua scsi_d
May 9 15:27:18 geo kernel: h ahci libahci libata megaraid_sas scsi_mod
May 9 15:27:18 geo kernel: [81924.434048] Supported: Yes, External
May 9 15:27:18 geo kernel: [81924.434050] CPU 0
May 9 15:27:18 geo kernel: [81924.434052] Modules linked in: ip6t_LOG
xt_tcpudp xt_pkttype ipt_LOG xt_limit binfmt_misc edd mpt3sas mpt2sas
raid_class mptctl mptbase cpufreq_conservative cpufreq_userspace
cpufreq_powersave acpi_cpufreq mperf microcode ip6t_REJECT
nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT
iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns
nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables
xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables fuse xfs
loop dm_mod sr_mod cdrom ipv6 ipv6_lib igb dca ipmi_si ses iTCO_wdt
i2c_i801 mei ptp usb_storage ipmi_msghandler joydev pcspkr enclosure
iTCO_vendor_support sg pps_core rtc_cmos button container shpchp
pci_hotplug ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm
i2c_algo_bit sysimgblt sysfillrect i2c_core syscopyarea isci(X) libsas
processor ehci_hcd scsi_transport_sas thermal_sys sd_mod crc_t10dif
hwmon usbcore usb_common scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc
scsi_dh_alua scsi_d
May 9 15:27:18 geo kernel: h ahci libahci libata megaraid_sas scsi_mod
May 9 15:27:18 geo kernel: [81924.434131] Supported: Yes, External
May 9 15:27:18 geo kernel: [81924.434134]
May 9 15:27:18 geo kernel: [81924.434137] Pid: 6073, comm: smbd-notifyd
Tainted: G X 3.0.101-71-default #1 Supermicro
X9DR3-F/X9DR3-F

I saw some discussions concerning this problem where most people say
this is a kernel
bug which can be triggered by samba, and Volker Lendecke from SerNet
recommends downgrading
Samba to 4.2 and waiting for the next kernel where the issue is fixed,
see:

http://samba.2283325.n4.nabble.com/AW-Centos-6-kernel-soft-lockup-CPU-20-stuck-for-67s-smbd-notifyd-after-upgrade-form-4-2-to-4-4-td4702013.html
http://comments.gmane.org/gmane.linux.suse.opensuse.evergreen/153
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1543980

My questions is, if kernel 3.0.101-71-default in SLES 11 SP4 also
suffers from this bug
and if so, when it will be fixed.

Thanks for any answers,[/color]

Can I ask why you want Samba 4.4.3 on SLES11 SP4? Is it for a particular
security fix or feature?

The latest supported version of Samba available from SUSE for SLES11 SP4
is 3.6.3. Given that SLES is SUSE Linux Enterprise Server installing
your own compiled version of Samba would be unsupported by SUSE.

If you want Samba 4.x then I suggest you look at SLES12 SP1 as the
latest supported version of Samba there is 4.2.4.

HTH.

Simon
SUSE Knowledge Partner


If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below. Thanks.

Simon,

the reason is that after the last samba update to version 3.6.3-76.1-3640-SUSE-CODE11-x86_64
(badlock patches) I got some problems with the functionality of samba.

One of the problems is that wbinfo -c is not working any more and that the automatic al password
change on the Microsoft AD which is triggered by samba usually every 7 days is broken. In the
samba log files this leads to messages like

[2016/04/15 19:16:44.095713, 0] rpc_client/cli_netlogon.c:693(rpccli_netlogon_set_trust_password)
credentials chain check failed

once per minute.

I asked SLES about this problem, but unfortunately at our University we only have a low level
subscription where support is not included so that I can not open a call at SUSE. Nevertheless
I got friendly answer from Kirk Penrose kpenrose@suse.com who opened bug 976657 for me, but
until now I got no solution.

Further more I saw that there are three additional patches addressing regressions in the samba 3.6
backports introduced by the last security releases. As far as I can see one of them should solve
the password change problem I guess. The related information can be found in

http://samba.2283325.n4.nabble.com/Badlock-regression-fixes-tt4701810.html

and I also sent this information to Kirk, but still I have no answer.

So if you ask me why I want Samba 4.4.3 this is the reason. Indeed I would highly appreciate using
the on board samba version of SLES 11 SP4 if the fixes mentioned above can be included. Otherwise
it’s a bit problematic to deal with a software with known limitations.

Concerning the usage of newer samba 4 releases which are compiled by my own it is obvious that
those releases will not be supported by SUSE. On the other hand, if it is possible to run software
which triggers a kernel bug causing the system to die slowly, the kernel bug should be fixed because
you can not expect that every software is written in a way to bypass this kernel problem.

Best Regards,

Klaus

On 10/05/16 13:24, kbraun wrote:
[color=blue]

the reason is that after the last samba update to version
3.6.3-76.1-3640-SUSE-CODE11-x86_64
(badlock patches) I got some problems with the functionality of samba.

One of the problems is that wbinfo -c is not working any more and that
the automatic al password
change on the Microsoft AD which is triggered by samba usually every 7
days is broken. In the
samba log files this leads to messages like

[2016/04/15 19:16:44.095713, 0]
rpc_client/cli_netlogon.c:693(rpccli_netlogon_set_trust_password)
credentials chain check failed

once per minute.

I asked SLES about this problem, but unfortunately at our University we
only have a low level
subscription where support is not included so that I can not open a call
at SUSE. Nevertheless
I got friendly answer from Kirk Penrose kpenrose@suse.com who opened
bug 976657 for me, but
until now I got no solution.

Further more I saw that there are three additional patches addressing
regressions in the samba 3.6
backports introduced by the last security releases. As far as I can see
one of them should solve
the password change problem I guess. The related information can be
found in

http://samba.2283325.n4.nabble.com/Badlock-regression-fixes-tt4701810.html

and I also sent this information to Kirk, but still I have no answer.

So if you ask me why I want Samba 4.4.3 this is the reason. Indeed I
would highly appreciate using
the on board samba version of SLES 11 SP4 if the fixes mentioned above
can be included. Otherwise
it’s a bit problematic to deal with a software with known limitations.

Concerning the usage of newer samba 4 releases which are compiled by my
own it is obvious that
those releases will not be supported by SUSE. On the other hand, if it
is possible to run software
which triggers a kernel bug causing the system to die slowly, the kernel
bug should be fixed because
you can not expect that every software is written in a way to bypass
this kernel problem.[/color]

Checking bug 976657 referenced above I see there is an unofficial (thus
unsupported) Samba version 3.6.3-101.1 available for testing with SLES11
SP4 at
http://download.opensuse.org/repositories/network:/samba:/MAINTAINED:/SLE_11/SLE_11/x86_64/

HTH.

Simon
SUSE Knowledge Partner


If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below. Thanks.

for another issue with this newest patch level version of samba for SLES11 SP4 (3.6.3-76.1) support gave us access to a PTF (program temporary fix) version:

Version 3.6.3-76.1.10865.1.PTF.978898-3660-SUSE-CODE11-x86_64

Maybe you could request this version, and see if it adresses your issue, too.

HTH, Tom

On 24/06/16 14:44, swadm wrote:
[color=blue]

for another issue with this newest patch level version of samba for
SLES11 SP4 (3.6.3-76.1) support gave us access to a PTF (program
temporary fix) version:

Version 3.6.3-76.1.10865.1.PTF.978898-3660-SUSE-CODE11-x86_64[/color]

Could you possibly send me your SR and/or bug (if you know it) number
via private message?
[color=blue]

Maybe you could request this version, and see if it adresses your issue,
too.[/color]

Except Klaus isn’t able to create a support ticket …

Thanks.

Simon
SUSE Knowledge Partner


If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below. Thanks.

Thank you for your suggestions. Because the file server I am talking about
is in productive use I will not be able to do tests if a special samba version
will work like expected or not.

To be honest, stability was one of the main reasons why we decided to use
Suse Linux Enterprise Server in our department and I was really upset when
I made the experience that a samba patch from SLES caused annoying
problems - and still there is no “official” solution.

For now I am using samba 4.2. which seems to run stable. Maybe I will
update the server to SLES 12 in the future to solve the problem definitely,
but this is an other topic.

For those who are interested in the question I had at the beginning of the threat
if a specific kernel bug will be fixed thus allowing the use of more up to date
samba releases like the 4.4 series the kernel update kernel-default-3.0.101-77.1
seems to fix the problem:

The following non-security bugs were fixed:

  • af_unix: Guard against other == sk in unix_dgram_sendmsg (bsc#973570).

Hopefully my samba problems are sorted out for now.

Thank’s again

Klaus

On 28/06/16 13:44, kbraun wrote:
[color=blue]

To be honest, stability was one of the main reasons why we decided to
use
Suse Linux Enterprise Server in our department and I was really upset
when
I made the experience that a samba patch from SLES caused annoying
problems - and still there is no “official” solution.

For now I am using samba 4.2. which seems to run stable. Maybe I will
update the server to SLES 12 in the future to solve the problem
definitely,
but this is an other topic.[/color]

So you’re now using Samba 4.2.x on SLES11 SP4? To be supported I’d
definitely look at upgrading the server to SLES12 SP1 with official
Samba 4.2.x packages.
[color=blue]

For those who are interested in the question I had at the beginning of
the threat
if a specific kernel bug will be fixed thus allowing the use of more up
to date
samba releases like the 4.4 series the kernel update
kernel-default-3.0.101-77.1
seems to fix the problem:

The following non-security bugs were fixed:

  • af_unix: Guard against other == sk in unix_dgram_sendmsg
    (bsc#973570).
    …[/color]

Thanks for the report back.

Simon
SUSE Knowledge Partner


If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below. Thanks.