I have been testing SLES 12 with the High Availability Extension pack installed.
I have 2 servers/nodes with all the latest patches installed.
I have configured DRBD as my shared storage and this is working fine - both nodes can start the VM successfully via the VMM.
I have configured the resource with parameters, config=, hypervisor=xen:///, migration_transport=.
The monitor, start, and stop op are as per defaults.
I have tried xen:///system and xen:///session for the hypervisor setting, but no differences noted. I have also tried a number of op settings for monitor, start, and stop settings - again, not differences.
My 2 nodes are configured for passwordless ssh login - this has been tested successfully.
When I start the resource it appears in a stopped state in the hawk cluster resources configured, however, the primitive is shown as started in its meta-attrbutes settings.
When I ‘view details’ on the resource it is shown as target role started and fail count = 0 for both nodes. There is no exit reason listed.
When I look at ‘view recent events’ I have 3 entries, ‘Success’ on node 1, ‘Success’ on node 2, and ‘Success’ on node 1 again. No errors are reported anywhere.
I have used the xen ocf resource before withiut issues, in SLES 12, there is only the OCF VirtualDomain resource.
Can anyone help in getting this resource settup such that the VM can be started by the cluster?
One interesting effect I have also noticed is if I try to live migrate a VM it ends up running on both nodes when I use the command line virsh migrate --live xen+ssh://<node 2>.corp ( I also tried the ip address of the node). There is an error, error: operation failed: Failed to unpause domain.
Can anyone shed any light on why this might be?
Thank you for any help,
John