Xen Performance

Hello All, I would like to learn your ideas about increasing the performance on Xen.
I am no Linux expert yet, but learning a lot.

The setup is a SLES 11 xen host, with a 8 core opteron processor and 32 Gb memory.
But the results are not so great as I hoped.

The network setup are 3 x GB Nics, all bridged and about 10 vm’s running

The companies main program is a Windows based program with database, which performs not as we want it to.
I tried to use the different VMDP drivers, even tried the GPLPV drivers.
Yes I did all the offloads. but to no use.

On file backup, I am getting like 4000 MB/s,
backing up the database from the main program, it is like maximum 3000 MB/s

My questions :
What are the speeds, everyone else is getting with windows VM’s ?

I have not done any cpu pinning
How can I see if that is needed ?

Stephan

Hi Stephan,

while you do seem to include quite a bit of info, there’s more needed to understand your situation:

On file backup, I am getting like 4000 MB/s,

“That’s 32 Gbps - quite a lot for 3x GB NICs.” In other words, I feel that you are talking about disk throughput, or you have your numbers mixed up :wink:

When looking at such performance issues, you have to filter out the “data path”: Where is the data coming from, where is it passing through, where does it go in the end?

Once you have your path elements, look at their utilization - and then try to find out how to influence it.

In the case of your backup:

  • is this some network backup of DomU disks? I assume so. Or is it a backup of VM disks to some tape attached to the VM via SCSI?
  • Is only the client system (“main program is a Windows based program with database”) hosted as a VM on this Dom0, or is the backup server hosted there, too?
  • How’s the DomU disk configured (Xen-wise), and what does it look like on the Dom0? Is it SAN storage, NAS, local disk (what type, how accessed)?
  • If the network connection is part of your data path: How is it configured? 3 NICs… that can be individual links or “link aggregation” - if the latter, of what type? Are all DomUs bridged the the same single (aggregated) link?
  • and so on

And then… now what have you already measured?

  • You told about the throughput of the backup client, which seems to be an application-level number from within the DomU.
  • How’s the VM loaded? Is CPU at 100%? How many virtual CPUs are there?
  • How’s the disk subsystem of the Dom0 handling all the reads? Have you measured your DomU’s read throughput without involving transfer via the backup software and other components, i.e. network?
  • How the load on the Dom0 (I’m not talking about that single “load5min” value, but about the whole picture - do you see high i/o waits on the DomU’s physical disk device? Any single CPU maxed out? Network adapters at throughput limit?)
  • How do other components cope with the load (switch/backup server(network, backup media, CPU)?

And do you by chance have comparison values for purely physical systems, so that you can compare your actual values to those achievable?

How can I see if that is needed ?

By doing a bottle-neck analysis :slight_smile:

Without further details, it’d be hard to judge where you might tune or have already reached the limit.

Regards,
Jens

If your vm’s are windows, you must purchase and install the vm driver pack, otherwise everything works veryyyyy slowlllllly!!! Thee driver pack installs the optimised lan and disc drivers for the windows vm’s, and you get back to gigablit network speeds again. Also, the disc access improves considerably. I get about 2.6gb/min backup over the network to backup exec 2014 (lt03 tape). On the down side, I have noticed that extensive database activity wwill slow down the other vm’s running on the server. (good example is running symantec endpoint protection 12.1 on one of the vm’s).

Hope this helps

ChasR.

Hi ChasR,

I believe he already tried that:

I tried to use the different VMDP drivers, even tried the GPLPV drivers.

Regards,
Jens

Hello guys,

Thank you for your thoughts.
I checked and yes I got my numbers wrong, sorry for this

Here are some examples ( iperf ) :
Pyshical workstation => Database server : 5.24 Mbit/s
Virtual WS => Database server : 4.23 Mbit/s
Virtual OES Server PV => Database server : 55.7 Mbit/s
Dom0 => Database server : 67,5 Mbit/s

So what could cause this difference ?
Stephan

Hi Stephan,

Here are some examples ( iperf ) :

…and again no context :frowning:

So what could cause this difference ?

A multitude of things.

Dom0 => Database server

that appears to be a local connection, in terms of “DB server is a DomU on the Dom0 in question”. This will make “network throughput” an almost purely CPU-bound operation, as no actual network connection needs to be used.

Virtual OES Server PV => Database server

“OES Server” is probably on the same Dom0 as the DB server. Hence similar numbers. The decrease may be because of having to cross the DomU/Dom0 line twice

Virtual WS => Database server

You say nothing about that WS - what OS? HVM/PVM? Same Dom0? Different subnets, with the router being a device external to Dom0?

Pyshical workstation => Database server

same speed range as “Virtual WS => Database server”… phys WS to server will obviously have to go across the Dom0 network interface, is this required for the test “virtual WS to server”, too?

Something to test would be physical WS to Dom0, to get a baseline for your server’s (Dom0) network attachment.

Those numbers are indeed rather low, too low to be acceptable for production use. Lets go and find the bottle-neck!

How’s Dom0 CPU utilization during the tests (especially check if any single CPU is at 100%, i.e. the one in charge for transporting the traffic from Dom0 to DomU) and how’s your exact networking setup on Dom0 and the DomUs in question?

Regards,
Jens

[QUOTE=stelgenkamp;23128]Hello guys,

Thank you for your thoughts.
I checked and yes I got my numbers wrong, sorry for this

Here are some examples ( iperf ) :
Pyshical workstation => Database server : 5.24 Mbit/s
Virtual WS => Database server : 4.23 Mbit/s
Virtual OES Server PV => Database server : 55.7 Mbit/s
Dom0 => Database server : 67,5 Mbit/s

So what could cause this difference ?
Stephan[/QUOTE]

Hi Stephan,

I’m assuming the Xen host is running SLES 11 SP3 and you have the VMDP 2.1 driver pack installed on the Windows domU/VM.

Two things I always test/do for each site I have Xen running:

  1. Try disabling LSO & LSO2 (and maybe also the offload settings) on the Windows VM’s network card. Test the nic throughput to and from sever <> client. Is it higher with these settings disabled?

  2. Try disabling the “xvdX disk mode” (don’t have the exact term in memory) after having installed the SUSE VMDP. This can drastically boost disk performance (which I’ve seen in iSCSI environments). Which setting is needed varies between setups though as it seems to depend on different factors (hardware, using fiber or iSCSI, etc).
    You can disable/enable the xvdX disk mode by running pvctrlw.exe (found in C:\Program Files\xvdp) and setting the disk setting to “none”. Reboot the VM and test if your disk throughput has increased or not. (HDtune is a nice quick tool to use for this).

Curious what your findings are…

Cheers,
Willem

So, an update :
Fiddled around alot. Moved the database server off to the Xen to a bare metal server.
Still no improvement : However I have found strange things :
Dom0 to database server has improved : 823 Mbit/s
Newer model WS ( W7 64 bit) is at 355 Mbit/s

ALL other WS ( Windows XP 32 bits ) remain around 5 Mbit/s
And it makes no difference if the WS is virtual or not.
Testing with 1 of the slow workstations, on the cable of the W7 ws shows do difference.
So this rules out cabling.
All WS have roaming profiles. Could it be that something in the profile is slowing the network throughput ?
Any ideas are welcome.
Stephan

PS. Anyone know if one should do the offload settings on the NIC ( eth0 ) or on the bridge ( br0 ) or both ?

[QUOTE=stelgenkamp;23264]Moved the database server off to the Xen to a bare metal server.
Still no improvement :[/QUOTE]

That seems to eliminate Xen as the source of the problem.

[QUOTE]ALL other WS ( Windows XP 32 bits ) remain around 5 Mbit/s
And it makes no difference if the WS is virtual or not.[/QUOTE]

Still, I would expect better performance when running on bare metal! How are you measuring this throughput?

[QUOTE]Testing with 1 of the slow workstations, on the cable of the W7 ws shows do difference.
So this rules out cabling.[/QUOTE]

When the system performance suffers it could be do to one or more components in the system. The only way to eliminate the bottleneck is to do careful testing in a way that eliminates each of the components one at a time. Remember also that what a small business considers reasonable performance may be completely unacceptable for a large enterprise.

[QUOTE]Any ideas are welcome.
Stephan[/QUOTE]

If you would like our help, we need to first know more about your configuration. Jens (jmozden) asked for additional information which you have not yet provided. I too have some questions. Performing random tests then posting the results does not help if we don’t know how these tests were conducted. It would also help if we could identify specific components to make it easier to refer to them. Can you provide device names or IP addresses to uniquely identify the devices we are testing?

First, tell us about the physical network:
[LIST]
[]Approximately how many physical Windows 7 PCs do you have?
[
]Are they all “fast” or are some of them slow?
[]Approximately how many physical Windows XP PCs do you have?
[
]Are they all “slow” or are some of them fast?
[]Are all these devices connect to the same switch?
[
]What make/model is your switch?
[*]Are all devices on the same subnet?
[/LIST]

Now, the bottleneck could be anywhere: workstation, server, network…
[LIST]
[]Can you copy a large file from one “fast” Windows 7 physical workstation to a second “fast” Windows 7 physical workstation and report back what throughput you have and how you are measuring it?
[
]Can you next copy a large file from one “fast” Windows 7 physical workstation to a second “slow” Windows XP physical workstation and report back what throughput you have and how you are measuring it?
[/LIST]

These two tests will do a couple of things:
[LIST]
[]It will establish what is the maximum throughput we can expect from your “fast” workstations.
[
]It will eliminate Xen, VMs, Servers, and databases.
[*]If the throughput is reasonable, it will confirm that the network and the workstations are unlikely the bottleneck.
[/LIST]

After you provide the requested information, we can decide what the next steps might be.