Attempting to setup corosync on san1.example.com [192.168.1.1] fails:
san1:~ # sleha-init
WARNING: Could not detect IP address for eth0
WARNING: Could not detect network address for eth0
Enabling sshd service
/root/.ssh/id_rsa already exists - overwrite? [y/N] y
Generating ssh key
Configuring csync2
Generating csync2 shared key (this may take a while)...ERROR: Can't create csync2 key
So, instead, I setup corosync on san2.example.com [192.168.1.2] instead:
[CODE]san2:~ # sleha-init
WARNING: Could not detect IP address for eth0
WARNING: Could not detect network address for eth0
Enabling sshd service
Generating ssh key
Configuring csync2
Generating csync2 shared key (this may take a while)…done
Enabling csync2 service
Enabling xinetd service
csync2 checking files
Configure Corosync:
This will configure the cluster messaging layer. You will need
to specify a network address over which to communicate (default
is eth0’s network, but you can use the network address of any
active interface), a multicast address and multicast port.
Network address to bind to (e.g.: 192.168.1.0) [] 192.168.1.2
Multicast address (e.g.: 239.x.x.x) [239.129.63.45] 239.1.1.2
Multicast port [5405]
Configure SBD:
If you have shared storage, for example a SAN or iSCSI target,
you can use it avoid split-brain scenarios by configuring SBD.
This requires a 1 MB partition, accessible to all nodes in the
cluster. The device path must be persistent and consistent
across all nodes in the cluster, so /dev/disk/by-id/* devices
are a good choice. Note that all data on the partition you
specify here will be destroyed.
Do you wish to use SBD? [y/N]
WARNING: Not configuring SBD - STONITH will be disabled.
Enabling hawk service
HA Web Konsole is now running, to see cluster status go to:
https://SERVER:7630/
Log in with username ‘hacluster’, password ‘linux’
WARNING: You should change the hacluster password to something more secure!
Enabling openais service
Waiting for cluster…done
Loading initial configuration
Done (log saved to /var/log/sleha-bootstrap.log)[/CODE]
But then san1.example.com [192.168.1.1] complains when joining the cluster:
[CODE]san1:~ # sleha-join
WARNING: Could not detect IP address for eth0
WARNING: Could not detect network address for eth0
Join This Node to Cluster:
You will be asked for the IP address of an existing node, from which
configuration will be copied. If you have not already configured
passwordless ssh between nodes, you will be prompted for the root
password of the existing node.
IP address or hostname of existing node (e.g.: 192.168.1.1) [] 192.168.1.2
Enabling sshd service
/root/.ssh/id_rsa already exists - overwrite? [y/N] y
Retrieving SSH keys from 192.168.1.2
Password:
Configuring csync2
Enabling csync2 service
Enabling xinetd service
WARNING: csync2 of /etc/csync2/csync2.cfg failed - file may not be in sync on all nodes
WARNING: csync2 run failed - some files may not be sync’d
Merging known_hosts
WARNING: known_hosts collection may be incomplete
WARNING: known_hosts merge may be incomplete
Probing for new partitions…ERROR: Failed to probe new partitions[/CODE]
I’ve verified that the firewall rules of both hosts have been flushed:
[CODE]san1:~ # iptables -L -v
Chain INPUT (policy ACCEPT 24M packets, 66G bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 25M packets, 37G bytes)
pkts bytes target prot opt in out source destination
[/CODE]
[CODE]san2:~ # iptables -L -v
Chain INPUT (policy ACCEPT 22M packets, 39G bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 17M packets, 43G bytes)
pkts bytes target prot opt in out source destination [/CODE]
There are some entries in /var/log/messages of san2.example.com that seem related to this:
Dec 17 19:04:51 san2 cib: [3615]: info: cib_stats: Processed 83 operations (1445.00us average, 0% utilization) in the last 10min
Dec 17 19:10:18 san2 crmd: [3620]: info: crm_timer_popped: PEngine Recheck Timer (I_PE_CALC) just popped (900000ms)
Dec 17 19:10:18 san2 crmd: [3620]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ]
Dec 17 19:10:18 san2 crmd: [3620]: info: do_state_transition: Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED
Dec 17 19:10:18 san2 pengine: [3619]: notice: unpack_config: On loss of CCM Quorum: Ignore
Dec 17 19:10:18 san2 crmd: [3620]: notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_res
ponse ]
Dec 17 19:10:18 san2 crmd: [3620]: info: do_te_invoke: Processing graph 2 (ref=pe_calc-dc-1355800218-15) derived from /var/lib/pengine/pe-input-2.bz2
Dec 17 19:10:18 san2 crmd: [3620]: notice: run_graph: ==== Transition 2 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-2.bz2): Complet
e
Dec 17 19:10:18 san2 crmd: [3620]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Dec 17 19:10:18 san2 pengine: [3619]: notice: process_pe_message: Transition 2: PEngine Input stored in: /var/lib/pengine/pe-input-2.bz2
Dec 17 19:14:51 san2 cib: [3615]: info: cib_stats: Processed 1 operations (0.00us average, 0% utilization) in the last 10min
What’s going on here? Why can’t san1.example.com setup/join the cluster?
Eric Pretorious
Truckee, CA