Hi Gert,
[QUOTE=gertsogaard;10611]First, thanks for your response ;o)
OS version, oh yes I forgot, sorry.
SLES11SP2 and the corresponding HA software.[/QUOTE]
I have to admit that even my test cluster is still at SP1 :[
[QUOTE=gertsogaard;10611]are both nodes set up with the correct interface definitions? Yes
Same multicast port & address, same network address? Yes
Is firewalling currently disabled to rule out errors in that department? Yes
/etc/corosync/corosync.conf for node 1: […][/QUOTE]
Yes, that definitely looks good so far.
The logging statement is set up for syslog, so you might find some useful information there… but I expect that to be hidden beneath lots of other messages, HA tends to be quite verbose, once you’re trying to get some useful information out of it
I’d run a “tcpdump -i <your cluster node’s ethernet interface name goes here> -nvv port 5405 and host 226.94.1.1”, which should give you quite some output like
cluster02:~ # tcpdump -i eth0 -nvv port 5405 and host 239.103.103.0
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
16:17:12.149368 IP (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto UDP (17), length 147) 10.0.99.1.5404 > 239.103.103.0.5405: UDP, length 119
16:17:12.542538 IP (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto UDP (17), length 147) 10.0.99.1.5404 > 239.103.103.0.5405: UDP, length 119
16:17:12.936567 IP (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto UDP (17), length 147) 10.0.99.1.5404 > 239.103.103.0.5405: UDP, length 119
16:17:13.332483 IP (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto UDP (17), length 147) 10.0.99.1.5404 > 239.103.103.0.5405: UDP, length 119
16:17:13.724467 IP (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto UDP (17), length 147) 10.0.99.1.5404 > 239.103.103.0.5405: UDP, length 119
16:17:14.124646 IP (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto UDP (17), length 147) 10.0.99.1.5404 > 239.103.103.0.5405: UDP, length 119
16:17:14.514524 IP (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto UDP (17), length 147) 10.0.99.1.5404 > 239.103.103.0.5405: UDP, length 119
16:17:14.909659 IP (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto UDP (17), length 147) 10.0.99.1.5404 > 239.103.103.0.5405: UDP, length 119
16:17:15.301609 IP (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto UDP (17), length 147) 10.0.99.1.5404 > 239.103.103.0.5405: UDP, length 119
16:17:15.698635 IP (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto UDP (17), length 147) 10.0.99.1.5404 > 239.103.103.0.5405: UDP, length 119
^C
10 packets captured
10 packets received by filter
0 packets dropped by kernel
of course with your mcast target address. (You can stop the trace via Ctrl-C).
[QUOTE=gertsogaard;10611]Hope this clarifies the situation?
Regards
Gert[/QUOTE]
If you see mcast traffic on each node (originating from the respective other node, of course), you can run a quick check on the “ring status” via “corosync-cfgtool -s”. If this reports “status = ring 0 active with no faults”, then it’s time to move up in the layers - if not, maybe syslog can help identifying the root cause.
If you don’t see the required multicast traffic, then something is bogus at the network layer.
Oh, and I assume that your test cluster is either in a totally different IP network / multicast domain or is using a different mcast address/port combination.
Regards,
Jens