SLES 11 SP2 ixbge problems

Hi,

we are running a backup server under SLES11SP2. The server is using a intel 10GB dual port adapter (only one port used).
Everything seem’s to run well, but we
p1p1 Link encap:Ethernet Hardware Adresse
inet Adresse:XX Bcast:XX.255 Maske:255.255.252.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4203653700 errors:1 dropped:75256 overruns:0 frame:1 ← dropped increasing
TX packets:40383486456 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 Sendewarteschlangenlänge:1000
RX bytes:1395542063354 (1330892.6 Mb) TX bytes:60581195964539 (57774730.6 Mb)

The system is using the following version:
filename: /lib/modules/3.0.13-0.27-default/kernel/drivers/net/ixgbe/ixgbe.ko
version: 3.3.8-k2
license: GPL
description: Intel(R) 10 Gigabit PCI Express Network Driver
author: Intel Corporation, linux.nics@intel.com
srcversion: E2C2447019ADF6802F97654
vermagic: 3.0.13-0.27-default SMP mod_unload modversions

We see an increasing dropped rate (RX). Also our nagios is telling us
NRPE_test_check_generic → net_errs_in CRITICAL - result:24.23 match:>10

Any idea ?
On the switch we see no port errors …
The server is using TSM for backup. It’s a DELL R610.

Bye, Peer

On 06/11/2015 10:44, mpibgc wrote:
[color=blue]

we are running a backup server under SLES11SP2. The server is using a
intel 10GB dual port adapter (only one port used).
Everything seem’s to run well, but we
p1p1 Link encap:Ethernet Hardware Adresse
inet Adresse:XX Bcast:XX.255 Maske:255.255.252.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4203653700 errors:1 dropped:75256 overruns:0
frame:1 ← dropped increasing
TX packets:40383486456 errors:0 dropped:0 overruns:0
carrier:0
collisions:0 Sendewarteschlangenl�nge:1000
RX bytes:1395542063354 (1330892.6 Mb) TX bytes:60581195964539
(57774730.6 Mb)

The system is using the following version:
filename:
/lib/modules/3.0.13-0.27-default/kernel/drivers/net/ixgbe/ixgbe.ko
version: 3.3.8-k2
license: GPL
description: Intel(R) 10 Gigabit PCI Express Network Driver
author: Intel Corporation, linux.nics@intel.com
srcversion: E2C2447019ADF6802F97654
vermagic: 3.0.13-0.27-default SMP mod_unload modversions

We see an increasing dropped rate (RX). Also our nagios is telling us
NRPE_test_check_generic → net_errs_in CRITICAL - result:24.23 match:>10

Any idea ?
On the switch we see no port errors …
The server is using TSM for backup. It’s a DELL R610.[/color]

Firstly I’m not sure if there’s been some corruption in your post as the
third sentence seems to have been cut off.

Anyway, given that you’re using SLES11 SP2 which has been out of General
Support for almost 2 years are you able to retry with the latest release
of SLES11, SLES11 SP4?

HTH.

Simon
SUSE Knowledge Partner


If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below. Thanks.

Hi Peer,

“frame” errors report failing CRC checks for incoming frames. Typical causes are bad cables and/or ports.

So you might want to try first to change the patch cables, and probably change the wall outlet, too, if the server isn’t connected to the switch directly.

Next steps are to change the ports on the switch and the server.

On rare occasions, it may have to do with electrical installations - like high-power appliances (lmotors etc) influencing the packet transmission by interference.

Regards,
Jens

Hi,
sorry for the delay and thanks for the answers.
a) We can not switch to a newer SLES release. This host is running kernel modules for a GFS (Stornext). This software
currently only supports SLES11SP2 in this kernel release.

b) We already tried a different cable, but maybe we have to switch the port on the core switch, too.

So the dropped rate is not really high (20000 per month). For a backup client transferring every night ~1TB it’s still ok.
But the old system (running sles 10.4.) with an older intel 10GB card never shows any similar behavior.

Some other settings:
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.all.rp_filter = 1
fs.inotify.max_user_watches = 65536
net.ipv4.conf.default.promote_secondaries = 1
net.ipv4.conf.all.promote_secondaries = 1
net.ipv4.tcp_window_scaling = 0
net.ipv4.tcp_sack = 0
net.ipv4.tcp_rmem = 4096 87380 4194304
net.ipv4.tcp_wmem = 4096 65536 4194304
net.ipv4.tcp_mem = 4096 87389 4194304
net.ipv4.conf.all.rp_filter = 1
net.core.rmem_max = 4194304
net.core.rmem_default = 10485760
net.core.wmem_max = 4194304
net.core.wmem_default = 10485760
net.core.optmem_max = 10485760
net.core.netdev_max_backlog = 250000
net.ipv4.tcp_moderate_rcvbuf = 1


ethtool -g p1p1
Ring parameters for p1p1:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
Current hardware settings:
RX: 1024 ← change this from 512 to 1024
RX Mini: 0
RX Jumbo: 0
TX: 512