corosync [TOTEM ] Type of received message is wrong... ignoring

johngoutbeck · January 13, 2021, 5:20pm

SLES 15 SP2 + HA
3 host cluster (1 on standby to give quorum)

first I had a Faulty corosync ring 1
**
nss-vmh01:~ # corosync-cfgtool -s
Printing ring status.
Local node ID 1084755185
RING ID 0
id = 192.168.12.241
status = ring 0 active with no faults
RING ID 1
id = 192.168.11.241
status = Marking ringid 1 interface 192.168.11.241 FAULTY
**

then performed a repair
now the rings are good,
**
nss-vmh01:~ # corosync-cfgtool -r
Re-enabling all failed rings.
nss-vmh01:~ # corosync-cfgtool -s
Printing ring status.
Local node ID 1084755185
RING ID 0
id = 192.168.12.241
status = ring 0 active with no faults
RING ID 1
id = 192.168.11.241
status = ring 1 active with no faults
nss-vmh01:~ # crm cluster status
Name: nss-cluster1

Services:
corosync active/running/disabled
pacemaker active/running/disabled

Printing ring status.
Local node ID 1084755185
RING ID 0
id = 192.168.12.241
status = ring 0 active with no faults
RING ID 1
id = 192.168.11.241
status = ring 1 active with no faults
**
BUT
receive these messages in the messages and corosync log files on all 3 hosts, taking up lots of space and makes it hard to read other messages
**
nss-vmh01:~ # tail -f /var/log/messages
2021-01-12T11:15:57.584211-07:00 nss-vmh01 corosync[33443]: [TOTEM ] Type of received message is wrong… ignoring 76.
2021-01-12T11:15:57.966715-07:00 nss-vmh01 corosync[33443]: [TOTEM ] Type of received message is wrong… ignoring 106.
2021-01-12T11:15:58.349326-07:00 nss-vmh01 corosync[33443]: [TOTEM ] Type of received message is wrong… ignoring 22.
2021-01-12T11:15:58.731932-07:00 nss-vmh01 corosync[33443]: [TOTEM ] Type of received message is wrong… ignoring -36.
2021-01-12T11:15:59.114563-07:00 nss-vmh01 corosync[33443]: [TOTEM ] Type of received message is wrong… ignoring -89.
2021-01-12T11:15:59.497116-07:00 nss-vmh01 corosync[33443]: [TOTEM ] Type of received message is wrong… ignoring -22.
2021-01-12T11:15:59.879862-07:00 nss-vmh01 corosync[33443]: [TOTEM ] Received message corrupted… ignoring.
2021-01-12T11:16:00.262404-07:00 nss-vmh01 corosync[33443]: [TOTEM ] Type of received message is wrong… ignoring -4.
2021-01-12T11:16:00.645121-07:00 nss-vmh01 corosync[33443]: [TOTEM ] Type of received message is wrong… ignoring -80.
2021-01-12T11:16:01.027637-07:00 nss-vmh01 corosync[33443]: [TOTEM ] Type of received message is wrong… ignoring 90.
**

corosync.cfg file
**
nss-vmh02:/data # cat /etc/corosync/corosync.conf

/etc/corosync/corosync.conf file autogenerated by YaST2.

Manually changed configurations may get lost when reconfigured by YaST2.

totem {
#The maximum number of messages that may be sent by one processor on receipt of the token.
max_messages: 20

    #Used for mutual node authentication
    crypto_cipher:  none

    #HMAC/SHA1 should be used to authenticate all message
    secauth:        off

    #How many token retransmits should be attempted before forming a new configuration.
    token_retransmits_before_loss_const:    10

    interface {
            #Network Address to be bind for this interface setting
            bindnetaddr:    192.168.12.0

            #The ringnumber assigned to this interface setting
            ringnumber:     0

            #The multicast port to be used
            mcastport:      5405

            #Time-to-live for cluster communication packets
            ttl:    1

            #The multicast address to be used
            mcastaddr:      239.12.12.12

    }
    interface {
            #The ringnumber assigned to this interface setting
            ringnumber:     1

            #The multicast address to be used
            mcastaddr:      239.11.11.11

            #Network Address to be bind for this interface setting
            bindnetaddr:    192.168.11.0

            #The multicast port to be used
            mcastport:      5405

    }
    #How long to wait for consensus to be achieved before starting a new round of membership configuration.
    consensus:      6000

    #Transport protocol
    transport:      udp

    #Used for mutual node authentication
    crypto_hash:    none

    #The mode for redundant ring. None is used when only 1 interface specified, otherwise, only active or passive may be choosen
    rrp_mode:       passive

    #How long to wait for join messages in membership protocol. in ms
    join:   60

    #This specifies the name of cluster
    cluster_name:   nss-cluster1

    #Timeout for a token lost. in ms
    token:  5000

    #The only valid version is 2
    version:        2

    #To make sure the auto-generated nodeid is positive
    clear_node_high_bit:    yes

    #Specifies version of IP to use for communication. Value can be one of ipv4 or ipv6.
    ip_version:     ipv4

}
logging {
#Log to a specified file
to_logfile: yes

    #Log to be saved in this specified file
    logfile:        /var/log/cluster/corosync.log

    #Log timestamp as well
    timestamp:      on

    #Facility in syslog
    syslog_facility:        daemon

    logger_subsys {
            #Enable debug for this logger.
            debug:  off

            #This specifies the subsystem identity (name) for which logging is specified
            subsys: QUORUM

    }
    #Log to syslog
    to_syslog:      yes

    #Whether or not turning on the debug information in the log
    debug:  off

    #Log to the standard error output
    to_stderr:      no

    #Logging file line in the source code as well
    fileline:       off

}
quorum {
#votequorum requires an expected_votes value to function
expected_votes: 3

    #Enables two node cluster operations
    two_node:       0

    #Enable and configure quorum subsystem
    provider:       corosync_votequorum

}
**

Any help, suggestions, comments?
Thanks

johngoutbeck · February 5, 2021, 6:45pm

Hello All;
After a lot of troubleshooting, this issue turned out to be a bad port on the Redundant Ring Channel switch. After moving to another port on the switch, the messages went away.
corosync[33443]: [TOTEM ] Type of received message is wrong… ignoring -22.
corosync[33443]: [TOTEM ] Received message corrupted… ignoring
—nss-vmh02:~ # corosync-cfgtool -s
Printing ring status.
Local node ID 1084755186
RING ID 0
id = 192.168.12.242
status = ring 0 active with no faults
RING ID 1
id = 192.168.11.242
status = ring 1 active with no faults
nss-vmh02:~ # crm cluster status
Name: nss-cluster1

Services:
corosync active/running/disabled
pacemaker active/running/disabled

Printing ring status.
Local node ID 1084755186
RING ID 0
id = 192.168.12.242
status = ring 0 active with no faults
RING ID 1
id = 192.168.11.242
status = ring 1 active with no faults

Thank you all for looking at this issue.
Have a good day.

Topic		Replies	Views
Cluster comms broken, or not SLES High Availability Extension	0	372	October 19, 2015
Seeing TOTEM msgs(Totem is unable to form a cluster because of an operating system or network fault) SLES High Availability Extension	1	1030	July 2, 2021
Unable to configure fresh pacemake cluster on SLES 11.1 SLES High Availability Extension	3	284	April 26, 2012
corosync start error SLES Configure-Administer	1	223	September 8, 2011
ha-cluster-join script error SLES High Availability Extension	0	1352	May 13, 2021

corosync [TOTEM ] Type of received message is wrong... ignoring

/etc/corosync/corosync.conf file autogenerated by YaST2.

Manually changed configurations may get lost when reconfigured by YaST2.

Related topics