Severe DomU performance regression after xen-4.2.3_08-0.7.1?

Hi,

I finally tracked down a severe performance regression on our
virtualized build servers. Before a patch and reboot cycle in late
November 2014 of the Dom0 a complete build cycle would take ~5h30,
afterwards ~9h30!

The culprit seems to be a xen update, either starting with
xen-4.2.4_02-0.7.1 (not yet verified) or xen-4.2.4_04-0.9.1 (verified
slowdown). The latest update xen-4.2.5_02-0.7.1 is also slow.
I’m currently back to normal performance with kernel-3.0.101-0.15-xen +
xen-4.2.3_08-0.7.1, which was the setup before November 2014. I first
downgraded the kernel, but that didn’t change performance. Only when I
downgraded the xen packages to 4.2.3_08-0.7.1 performance came back to
normal.

The server is a dual-socket Xeon E5-2650 with 64G RAM.
Anyone experienced something similar? Did a setting/default change after
4.2.3_08-0.7.1?

Franz.

Hi Franz,

[QUOTE=Franz Sirl;26532]Hi,

I finally tracked down a severe performance regression on our
virtualized build servers. Before a patch and reboot cycle in late
November 2014 of the Dom0 a complete build cycle would take ~5h30,
afterwards ~9h30!

The culprit seems to be a xen update, either starting with
xen-4.2.4_02-0.7.1 (not yet verified) or xen-4.2.4_04-0.9.1 (verified
slowdown). The latest update xen-4.2.5_02-0.7.1 is also slow.
I’m currently back to normal performance with kernel-3.0.101-0.15-xen +
xen-4.2.3_08-0.7.1, which was the setup before November 2014. I first
downgraded the kernel, but that didn’t change performance. Only when I
downgraded the xen packages to 4.2.3_08-0.7.1 performance came back to
normal.

The server is a dual-socket Xeon E5-2650 with 64G RAM.
Anyone experienced something similar? Did a setting/default change after
4.2.3_08-0.7.1?

Franz.[/QUOTE]

I received similar reports inhouse, which I attributed to SAN/NAS changes that were implemented for the affected group - but I’m pretty sure that the Xen servers used were updated “to the latest code” then, too… currently running 4.2.5-_02-0.7.1.

Are you in a position to open as service request? That would ensure the right people can assist in finding the cause, especially since you seem to be able to reproduce the symptoms!

Regards,
Jens

Am 2015-02-24 um 16:44 schrieb jmozdzen:[color=blue]

Hi Franz,
Franz Sirl;26532 Wrote:[color=green]

Hi,

I finally tracked down a severe performance regression on our
virtualized build servers. Before a patch and reboot cycle in late
November 2014 of the Dom0 a complete build cycle would take ~5h30,
afterwards ~9h30!

The culprit seems to be a xen update, either starting with
xen-4.2.4_02-0.7.1 (not yet verified) or xen-4.2.4_04-0.9.1 (verified
slowdown). The latest update xen-4.2.5_02-0.7.1 is also slow.
I’m currently back to normal performance with kernel-3.0.101-0.15-xen +
xen-4.2.3_08-0.7.1, which was the setup before November 2014. I first
downgraded the kernel, but that didn’t change performance. Only when I
downgraded the xen packages to 4.2.3_08-0.7.1 performance came back to
normal.

The server is a dual-socket Xeon E5-2650 with 64G RAM.
Anyone experienced something similar? Did a setting/default change after
4.2.3_08-0.7.1?

Franz.[/color]

I received similar reports inhouse, which I attributed to SAN/NAS
changes that were implemented for the affected group - but I’m pretty
sure that the Xen servers used were updated “to the latest code” then,
too… currently running 4.2.5-_02-0.7.1.

Are you in a position to open as service request? That would ensure the
right people can assist in finding the cause, especially since you seem
to be able to reproduce the symptoms![/color]

I just found http://www.gossamer-threads.com/lists/xen/devel/367589
which sounds like the same problem. The last message suggests that later
xen-versions like 4.3.x, 4.4.x etc are not affected, that means
upgrading to SLES12 with currently xen-4.4.1 is also an option to fix
this issue. Or wait briefly for SLES11SP4 which also should have xen-4.4
according to http://www.susecon.com/doc/2014/sessions/FUT7606.pdf .

As for the SR, that means I would finally have to tackle our messed up
license status after the SCC/NCC split :smiley: . But I plan to do that, just
wanted a few more tests like going back to the most current kernel and
also testing 4.2.4_02-0.7.1 for completeness. And as each test cycle
takes at least one day, I won’t report the SR before early next week.

Franz

Hi Franz,

thank you for sharing those details - I’ve forwarded them to my SUSE contact, hoping this will draw SUSE engineering’s attention. Opening a SR will definitely help with that, so if you PM me the SR number once you came around to reporting this issue, I’ll do my best to push from this side, too.

Regards,
Jens

Am 2015-02-24 um 18:34 schrieb jmozdzen:[color=blue]

Hi Franz,

thank you for sharing those details - I’ve forwarded them to my SUSE
contact, hoping this will draw SUSE engineering’s attention. Opening a
SR will definitely help with that, so if you PM me the SR number once
you came around to reporting this issue, I’ll do my best to push from
this side, too.[/color]

Hi Jens,

I tracked down that the performance regression started with
xen-4.2.4_02-0.7.1 and opened SR 10940181481 for it.

Franz.

Hi Franz,

[QUOTE=Franz Sirl;26705]Hi Jens,

I tracked down that the performance regression started with
xen-4.2.4_02-0.7.1 and opened SR 10940181481 for it.[/QUOTE]

thank you for the SR number - lets see what I can help from this side to keep things in the right tracks :slight_smile: (Although I doubt that I’ll have to do anything at all, you seem very well prepared for the case.)

Regards,
Jens

Am 2015-03-03 um 11:14 schrieb jmozdzen:[color=blue]

Hi Franz,
Franz Sirl;26705 Wrote:[color=green]

Hi Jens,

I tracked down that the performance regression started with
xen-4.2.4_02-0.7.1 and opened SR 10940181481 for it.[/color]

thank you for the SR number - lets see what I can help from this side to
keep things in the right tracks :slight_smile: (Although I doubt that I’ll have to
do anything at all, you seem very well prepared for the case.)[/color]

We received a PTF that fixes the issue and the SR has been closed. I
guess soon there will be a xen update in the patch channels.

Franz