we have a significant performance situation here and I’d like to know if it is “normal” or/and if we can do anything about it (“performance tuning”-wise).
We’re running two SLES11SP1 + HAE nodes, latest patches applied, Xen kernel. The problem exists since the beginning (no changes by applying patches, so far). Both nodes access two shared “Fiber Channel”-based LUNs (disks), with OCFS2 as file systems on top. We’re using Pacemaker as heartbeat to OCFS2. A third node (with no OCFS2 or alike) is running as a standby node and has taken the DC role, both “OCFS2” nodes are simple members. DLM is only active on these two nodes.
One of the file systems is used to store virtual disk images (mounted on /var/lib/xen/images) but currently empty, the other one for VM config files and xend lock files (mounted on /etc/xen/vm).
(“xend lock files” refers to the xend-config.sxp options “(xend-domain-lock yes)” plus “(xend-domain-lock-path /etc/xen/vm/vm_locks)” active on both nodes).
For debugging purposes, we run a periodic access check (“time ls -lR /var/lib/xen/images/” and “time ls -lR /etc/xen/vm/vm_locks”) and monitor the required time to run that command, on both nodes.
With /var/lib/xen/images, we always get sub-second response times (< 100 ms typically) on both nodes.
With /etc/xen/vm, things are much different:
When the FS is mounted on a single node only, then the test is equally quick.
When the FS gets mounted on the second node and locks are used on the second node, too, then the times on the first node jump to around 2000 ms (with peaks at 5 seconds) and the second node responds in 6 to 10 seconds, sometimes even higher.
[*]I currently have no verified data at hand for the situation “both nodes mounted but locks only on one node”.
Obviously, the delays have to do with the distributed locking across both nodes. But since both nodes aren’t under significant load (neither is the FC server) and the networks between both nodes (production network plus dedicated connection via a separate switch) are almost idle, I believe these values to be a bit high…
Anyone out there who could share her/his experiences with me?