we have a global file system (Quantum StorNEXT) up and running in a HA config.
2 MDC are controlling the file systems. The hardware are 2 DELL 1950, with 16G RAM, 3GHz CPU and a 4 GB FC HBA.
Both are running SLES 10 SP4 (x86_64).
We exchanged one machine. The new one is a DELL R620, 32GB, 3.3 E5-CPU, 8GB FC HBA.
On this node we installed SLES 11SP1 (higher is not supported for the Software).
The node is working well, but the performance as meta data server for the filesystem
is 10-20% SLOWER compared to the 1950!
In both case we are using only the default drives. On both machines qlogic hba are installed.
Any idea, what’s going on ? We never set any kernel specific data …
it’s the first time I saw a new machine performing slower then an old one!
have you already tried to identify the current bottleneck, like CPU, network, FC, local disks? is running mixed OSs supported with your HA solution?
It might be the setup of the new machine, but as well some incompatibility between the two HA nodes leading to massive overhead. Has the network access changed (i.e. 100 Mbps to Gigabit), is the link properly configured (autoconfiguration sometimes does funny things to your link configuration)?
have you already tried to identify the current bottleneck, like CPU, network, FC, local disks? is running mixed OSs supported with your HA solution?
It might be the setup of the new machine, but as well some incompatibility between the two HA nodes leading to massive overhead. Has the network access changed (i.e. 100 Mbps to Gigabit), is the link properly configured (autoconfiguration sometimes does funny things to your link configuration)?
Regards,
Jens[/QUOTE]
Yes,
I did some basic tests. The StorNEXT needs SAN connection and Network. SAN to get the META DATA, network to
talk with the clients. First I tested the raw read performance using different block sizes.
The system seems to use the full 8GB HBA bandwidth. Small I/Os are not easy to test, because I do not have free
devices to test and can not override parts of the file system
Network: I did some small tests transferring smaller files using rsync. The tg3 module seems to be slower then
the bnx on the old hardware. But no errors …
I’ll check this.
it generally is a good idea to monitor production servers and log historical data - that way you can identify changes in the usage impacts (i.e. reduced/increased network usage, distribution of CPU cycles, response times, …) and of course it does provide availability monitoring, too Maybe someone is already running such a tool so you could get an actual picture and check against pre-hardware-swap values?
We are monitoring quiet a lot of thinks, but first you have to know where to look (cpu, memory, disk, hba, network, …)!
However we could track it down to the broadcom NIC! It’s hard to believe, but the
dumb BIOS setting “BEST PERFORMANCE per watt” was active. We changed this setting to “MAX PERFORMANCE”
and doubled the performance of the NIC !
So at least my fault, I thought the BIOS was already set to performance, but I never had expected such influence.
Oh so true And even when you look in the right place, you’ll have to interpret right… in your case, you’d probably see less network traffic than before the hardware change… but no errors or alike, implying that less users than before are using the system…
[QUOTE=mpibgc;14629] However we could track it down to the broadcom NIC! It’s hard to believe, but the
dumb BIOS setting “BEST PERFORMANCE per watt” was active. We changed this setting to “MAX PERFORMANCE”
and doubled the performance of the NIC !
So at least my fault, I thought the BIOS was already set to performance, but I never had expected such influence.
Always double check the BIOS settings …
Thanks and bye, Peer[/QUOTE]
Thank you for the feedback - I wouldn’t have come up with such a setting :o, glad you found it!