Can't take node out of standby after upgrade to SP2

I have a two-node HAE cluster that I am trying to upgrade to SP2. After upgrading one node to SP2, I can’t take it out of standby.

Here is what I have done so far:

Put node1 into standby
Did “zypper update” to bring everything up to date on SP1 and rebooted
Installed the sp2 migration packages for SLES, HAE, and SDK
Did “suse_register -d 2 -L /root/.suse_register.log”
Did “zypper dup”
Rebooted.

When I go into “crm”, to “node” and do “online”, I get:
Error setting standby=off (section=nodes, set=): Application of an update diff failed
Error performing operation: Application of an update diff failed

Any ideas?

To sum it up: node1 is in standby, and has been upgraded to SP2, while node2 is online and at SP1. crm status on node 1 shows node1 standby and node2 offline, and on node 2 it shows node1 and standby and node2 online

Any ideas???
Surely there isn’t a problem that would prevent SP1 and SP2 from co-existing long enough to do an upgrade? If so , how do you do an upgrade without taking the cluster down.
Or, could something have failed on my install on node1?

Any help would be greatly appreciated if anyone has the answer. Otherwise, it looks like I’m going to be opening a support incident.

Allen Beddingfield
Systems Engineer
The University of Alabama

Hi Allen,

according to the documentation (http://www.suse.com/documentation/sle_ha/book_sleha/?page=/documentation/sle_ha/book_sleha/data/part_install.html chapter D.3), a rolling update is supported. The steps given in D.2 (which are, according to D.3 basically correct) do not mention setting the node into standby, but I would have assumed that to work (and to get it back operational, of course).

crm status on node 1 shows node1 standby and node2 offline, and on node 2 it shows node1 and standby and node2 online

Seems that for some reason, you face a split brain situation, i.e. caused by communication failures (firewalls, configuration trouble during/after upgrade, driver problems, etc) - is there anything in the logs that might point you in the right direction?

Regards,
Jens