HA Configuration Issue

Hi,

I am facing issue with HA cluster. I have 2 nodes. I am able to see one node online in each server respectively.

NODE 1

SRVPHN85:~ # crm status
Stack: corosync
Current DC: SRVPHN85 (version 1.1.15-19.15-e174ec8) - partition WITHOUT quorum
Last updated: Tue Jul 25 15:39:36 2017
Last change: Tue Jul 25 14:04:02 2017 by root via cibadmin on SRVPHN85

1 node configured
1 resource configured

Online: [ SRVPHN85 ]

Full list of resources:

admin_addr (ocf::heartbeat:IPaddr2): Stopped


NODE 2

SRVPHN87:~ # crm status
Stack: corosync
Current DC: SRVPHN87 (version 1.1.15-21.1-e174ec8) - partition WITHOUT quorum
Last updated: Tue Jul 25 15:35:00 2017
Last change: Tue Jul 25 15:05:57 2017 by root via cibadmin on SRVPHN87

1 node configured
0 resources configured

Online: [ SRVPHN87 ]

Full list of resources:


Need to resolved the issue. Need help.

Hi Raju,

Could you please share “crm configure show” output? I am suspecting there is some problem with network communication (multicast) between the nodes. Please change it to unicast and try to join the second node again.

Thanks

Hi,

Thanks for the reply. I think ports 5404 and 5405 are blocked between nodes. Will there be any issue due to this.

SRVPHN85:~ # crm configure show
node 184357468: SRVPHN85
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.15-21.1-e174ec8 \
cluster-infrastructure=corosync \
cluster-name=hacluster \
show \
stonith-enabled=false


SRVPHN87:~ # crm configure show
node 184357468: SRVPHN87
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.15-21.1-e174ec8 \
cluster-infrastructure=corosync \
cluster-name=hacluster \
show \
stonith-enabled=false


Sorry mistake.

SRVPHN85:~ # crm configure show
node 184357467: SRVPHN85 \
attributes standby=off
primitive admin_addr IPaddr2 \
params ip=xx.xx.xx.xx \
op monitor interval=10 timeout=20 \
meta target-role=Started
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.15-19.15-e174ec8 \
cluster-infrastructure=corosync \
cluster-name=hacluster \
stonith-enabled=false \
placement-strategy=balanced
rsc_defaults rsc-options: \
resource-stickiness=1 \
migration-threshold=3
op_defaults op-options: \
timeout=600 \
record-pending=true


SRVPHN87:~ # crm configure show
node 184357468: SRVPHN87
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.15-21.1-e174ec8 \
cluster-infrastructure=corosync \
cluster-name=hacluster \
show \
stonith-enabled=false

Please share “/etc/corosync/corosync.conf” file.

NODE 1

Please read the corosync.conf.5 manual page

totem {
version: 2
secauth: on
crypto_hash: sha1
crypto_cipher: aes256
cluster_name: hacluster
clear_node_high_bit: yes

token:		5000
token_retransmits_before_loss_const: 10
join:		60
consensus:	6000
max_messages:	20

interface {
	ringnumber:	0
	bindnetaddr:	xx.xx.xx.xx
	mcastaddr:	239.108.147.175
	mcastport:	5405
	ttl:		1
}

}
logging {
fileline: off
to_stderr: no
to_logfile: no
logfile: /var/log/cluster/corosync.log
to_syslog: yes
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
quorum {
# Enable and configure quorum subsystem (default: off)
# see also corosync.conf.5 and votequorum.5
provider: corosync_votequorum
expected_votes: 3
two_node: 0
}

NODE 2

Please read the corosync.conf.5 manual page

totem {
version: 2
secauth: on
crypto_hash: sha1
crypto_cipher: aes256
cluster_name: hacluster
clear_node_high_bit: yes

token:		5000
token_retransmits_before_loss_const: 10
join:		60
consensus:	6000
max_messages:	20

interface {
	ringnumber:	0
	bindnetaddr:	xx.xx.xx.xx
	mcastaddr:	239.108.147.175
	mcastport:	5405
	ttl:		1
}

}
logging {
fileline: off
to_stderr: no
to_logfile: no
logfile: /var/log/cluster/corosync.log
to_syslog: yes
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
quorum {
# Enable and configure quorum subsystem (default: off)
# see also corosync.conf.5 and votequorum.5
provider: corosync_votequorum
expected_votes: 3
two_node: 0
}

You have to change few things.

  1. Change network communication ===> udpu

[CODE]totem {
version: 2
secauth: on
crypto_hash: sha1
crypto_cipher: aes256
cluster_name: hacluster
clear_node_high_bit: yes
token: 5000
token_retransmits_before_loss_const: 10
join: 60
consensus: 6000
max_messages: 20
interface {
ringnumber: 0
bindnetaddr: 192.168.220.0
mcastport: 5405
ttl: 1
}

    transport: udpu

}
[/CODE]

  1. Change quorum section (For two nodes)

[CODE]quorum {

    # Enable and configure quorum subsystem (default: off)
    # see also corosync.conf.5 and votequorum.5
    provider: corosync_votequorum
    expected_votes: 2
    two_node: 1

}

[/CODE]

  1. Set proper cib-bootstrap option (For two nodes)

property cib-bootstrap-options: \\ stonith-enabled=true \\ placement-strategy=balanced \\ no-quorum-policy=ignore \\ stonith-action=reboot \\ startup-fencing=false \\ stonith-timeout=150 \\

[QUOTE=arunabha_banerjee;38879]You have to change few things.

  1. Change network communication ===> udpu

[CODE]totem {
version: 2
secauth: on
crypto_hash: sha1
crypto_cipher: aes256
cluster_name: hacluster
clear_node_high_bit: yes
token: 5000
token_retransmits_before_loss_const: 10
join: 60
consensus: 6000
max_messages: 20
interface {
ringnumber: 0
bindnetaddr: 192.168.220.0
mcastport: 5405
ttl: 1
}

    transport: udpu

}
[/CODE]

  1. Change quorum section (For two nodes)

[CODE]quorum {

    # Enable and configure quorum subsystem (default: off)
    # see also corosync.conf.5 and votequorum.5
    provider: corosync_votequorum
    expected_votes: 2
    two_node: 1

}

[/CODE]

  1. Set proper cib-bootstrap option (For two nodes)

property cib-bootstrap-options: \\ stonith-enabled=true \\ placement-strategy=balanced \\ no-quorum-policy=ignore \\ stonith-action=reboot \\ startup-fencing=false \\ stonith-timeout=150 \\ [/QUOTE]


how can i remove the cluster from both nodes and start installing from first?

Seems both nodes are behaving like individual nodes, please use sleha-join -c to resolve the issue.

[QUOTE=raju7258;38869]Hi,

I am facing issue with HA cluster. I have 2 nodes. I am able to see one node online in each server respectively.

NODE 1

SRVPHN85:~ # crm status
Stack: corosync
Current DC: SRVPHN85 (version 1.1.15-19.15-e174ec8) - partition WITHOUT quorum
Last updated: Tue Jul 25 15:39:36 2017
Last change: Tue Jul 25 14:04:02 2017 by root via cibadmin on SRVPHN85

1 node configured
1 resource configured

Online: [ SRVPHN85 ]

Full list of resources:

admin_addr (ocf::heartbeat:IPaddr2): Stopped


NODE 2

SRVPHN87:~ # crm status
Stack: corosync
Current DC: SRVPHN87 (version 1.1.15-21.1-e174ec8) - partition WITHOUT quorum
Last updated: Tue Jul 25 15:35:00 2017
Last change: Tue Jul 25 15:05:57 2017 by root via cibadmin on SRVPHN87

1 node configured
0 resources configured

Online: [ SRVPHN87 ]

Full list of resources:


Need to resolved the issue. Need help.[/QUOTE]

[QUOTE=raju7258;38869]Hi,

I am facing issue with HA cluster. I have 2 nodes. I am able to see one node online in each server respectively.

NODE 1

SRVPHN85:~ # crm status
Stack: corosync
Current DC: SRVPHN85 (version 1.1.15-19.15-e174ec8) - partition WITHOUT quorum
Last updated: Tue Jul 25 15:39:36 2017
Last change: Tue Jul 25 14:04:02 2017 by root via cibadmin on SRVPHN85

1 node configured
1 resource configured

Online: [ SRVPHN85 ]

Full list of resources:

admin_addr (ocf::heartbeat:IPaddr2): Stopped


NODE 2

SRVPHN87:~ # crm status
Stack: corosync
Current DC: SRVPHN87 (version 1.1.15-21.1-e174ec8) - partition WITHOUT quorum
Last updated: Tue Jul 25 15:35:00 2017
Last change: Tue Jul 25 15:05:57 2017 by root via cibadmin on SRVPHN87

1 node configured
0 resources configured

Online: [ SRVPHN87 ]

Full list of resources:


Need to resolved the issue. Need help.[/QUOTE]


“Please check the servers are syncing time with NTP properly”

-Nitiratna Nikalje

Have you checked the firewall ports are opened?
I have seen such behaviour when the nodes cannot communicate with each other.

As you haven’t mentioned which verison of SLES you are using - I assume SLES 15.
It’s using firewalld by default and that doesn’t have a firewall service by default.

On my test openSUSE 15.1 I am using the following:

[CODE]# cat /etc/firewalld/services/high-availability.xml

<?xml version="1.0" encoding="utf-8"?> Custom High Availability Service This allows you to use the High Availability . Ports are opened for corosync, pacemaker_remote, dlm , hawk and corosync-qnetd. [/CODE]