Dear forum,
I’m having a bit of trouble with my firewall and MPI configuration. I have two nodes of a cluster connected with both Ethernet and InfiniBand that should run a simulation either with Platform MPI or Intel MPI together, but it doesn’t work (duh). From what I’ve seen, it looks as the firewall is still blocking ports eventhough it’s switched off or the interfaces are assigned to the internal zone or I explicitly open ports.
Here’s the result of a port scan from slave to master:
n001:~ # netcat -zv 192.168.20.1 1-50000
admin-ib.default.domain [192.168.20.1] 44322 (pmcdproxy) open
admin-ib.default.domain [192.168.20.1] 44321 (pmcd) open
admin-ib.default.domain [192.168.20.1] 43483 (?) open
admin-ib.default.domain [192.168.20.1] 39828 (?) open
admin-ib.default.domain [192.168.20.1] 37392 (?) open
admin-ib.default.domain [192.168.20.1] 37194 (?) open
admin-ib.default.domain [192.168.20.1] 35288 (?) open
admin-ib.default.domain [192.168.20.1] 15007 (?) open
admin-ib.default.domain [192.168.20.1] 15004 (pbs_sched) open
admin-ib.default.domain [192.168.20.1] 15001 (pbs) open
admin-ib.default.domain [192.168.20.1] 7890 (?) open
admin-ib.default.domain [192.168.20.1] 4673 (cxws) open
admin-ib.default.domain [192.168.20.1] 4672 (rfa) open
admin-ib.default.domain [192.168.20.1] 2049 (nfs) open
admin-ib.default.domain [192.168.20.1] 777 (multiling-http) open
admin-ib.default.domain [192.168.20.1] 737 (?) open
admin-ib.default.domain [192.168.20.1] 682 (xfr) open
admin-ib.default.domain [192.168.20.1] 111 (sunrpc) open
admin-ib.default.domain [192.168.20.1] 22 (ssh) open
The MPI job would fail with such an error message (XXX is my master):
[proxy:0:1@n001] HYDU_sock_connect (./utils/sock/sock.c:227): unable to get host address for XXX (2)
[proxy:0:1@n001] main (./pm/pmiserv/pmip.c:396): unable to connect to server XXX at port 34272 (check for firewalls!)
The odd thing is, if I do a port scan again, the port 34272 would now be listed as open. However, that doesn’t help since MPI starts with random ports.
What’s going on? Is another part of the system blocking ports (I can close down all ports with my firewall, but I can’t seem to open other ports)? May it be the switch?
Cheers and thanks!