Answered by:
MPI over Infiniband (parallel Fluent on Windows Compute Cluster 2003)

Question
-
Hi,
I'm running parallel Fluent jobs on Windows Compute Cluster 2003. I am able to run jobs using the Gigabit Ethernet without problems although the scalability / efficiency is not as good as I expected. The cluster also has Voltaire Infiniband installed so I hoped that switching to this interconnect would improve the performance.
First I ran the vstat command to check that the Infiniband is up and working.
Then I ran ipconfig to check the connection details of the Infiniband.
Next I submitted a job with the MPICH_NETMASK variable set to 192.168.160.0/255.255.255.0 so the MPI communication will go over the fast connections. Unfortunately my job crashed and gave the following output. Could someone see in the output what goes wrong?
Thanks in advance,
Koen
Loading "\spitfireb-hn\Fluent.Inc.6326\fluent6.3.26\lib\fl114-64.dmp"
Done.
Warning: -path flag not specified.
Defaulting to -path\\spitfireb-hn\Fluent.Inc.6326
Welcome to Fluent 6.3.26
Copyright 2006 Fluent Inc.
All Rights Reserved
Loading "\\spitfireb-hn\Fluent.Inc.6326\fluent6.3.26\lib\flprim1119-64.dmp"
Done.
Host spawning Node 0 on machine "sb-node027" (win64).
You can click CTRL+C to stop the startup process!
Using user defined: -env MPICH_NETMASK 192.168.160.0/255.255.255.0
job aborted:
rank: node: exit code: message
0: SB-NODE027: terminated
1: SB-NODE027: terminated
2: SB-NODE027: terminated
3: SB-NODE027: terminated
4: SB-NODE028: terminated
5: SB-NODE028: terminated
6: SB-NODE028: fatal error: Fatal error in MPI_Barrier: Other MPI error, error stack:
MPI_Barrier(406)....................: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier(76)....................:
MPIC_Sendrecv(142)..................:
MPID_Isend(97)......................: failure occurred while attempting to send an eager message
MPIDI_CH3_iSendv_internal(242)......:
MPIDI_CH3I_Sock_connect(381)........: [ch3:sock] rank 6 unable to connect to rank 8 using business card <port=4829 description="152.78.60.167 192.168.60.164 sb-node025 " shm_host=sb-node025 shm_queue=E939FC22-5262-44ef-A5F5-745EBFA2AC88 >
MPIDU_Sock_post_connect_filter(1258): unable to connect to 152.78.60.167 192.168.60.164 sb-node025 on port 4829, no endpoint matches the netmask 192.168.160.0/255.255.255.0
7: SB-NODE028: fatal error: Fatal error in MPI_Barrier: Other MPI error, error stack:
MPI_Barrier(406)....................: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier(76)....................:
MPIC_Sendrecv(142)..................:
MPID_Isend(97)......................: failure occurred while attempting to send an eager message
MPIDI_CH3_iSendv_internal(242)......:
MPIDI_CH3I_Sock_connect(381)........: [ch3:sock] rank 7 unable to connect to rank 8 using business card <port=4829 description="152.78.60.167 192.168.60.164 sb-node025 " shm_host=sb-node025 shm_queue=E939FC22-5262-44ef-A5F5-745EBFA2AC88 >
MPIDU_Sock_post_connect_filter(1258): unable to connect to 152.78.60.167 192.168.60.164 sb-node025 on port 4829, no endpoint matches the netmask 192.168.160.0/255.255.255.0
8: SB-NODE025: terminated
9: SB-NODE025: terminated
10: SB-NODE025: terminated
11: SB-NODE025: terminated
12: SB-NODE026: terminated
13: SB-NODE026: terminated
14: SB-NODE026: terminated
15: SB-NODE026: terminated
16: SB-NODE024: terminated
17: SB-NODE024: terminated
18: SB-NODE024: terminated
19: SB-NODE024: terminated
20: SB-NODE029: terminated
21: SB-NODE029: terminated
22: SB-NODE029: terminated
23: SB-NODE029: terminated
---- error analysis -----
6: mpi has detected a fatal error and aborted \\spitfireb-hn\Fluent.Inc.6326\fluent6.3.26\win64\3ddp_node\fl_mpi6326.exe run on SB-NODE028
7: mpi has detected a fatal error and aborted \\spitfireb-hn\Fluent.Inc.6326\fluent6.3.26\win64\3ddp_node\fl_mpi6326.exe run on SB-NODE028
---- error analysis -----
Monday, July 28, 2008 5:06 PM
Answers
-
Hi K.J,
seems like your netmask does not match your network; see,
no endpoint matches the netmask 192.168.160.0/255.255.255.0
your network is 192.168.60.164
change your netmaks to 192.168.60.0/255.255.255.0 (assuming that all your nodes ip addr are 192.168.60.x)
thanks,
.Erez- Proposed as answer by elantzMicrosoft employee Wednesday, July 30, 2008 5:21 PM
- Marked as answer by Don Pattee Monday, April 13, 2009 5:37 AM
Monday, July 28, 2008 11:07 PM
All replies
-
Hi K.J,
seems like your netmask does not match your network; see,
no endpoint matches the netmask 192.168.160.0/255.255.255.0
your network is 192.168.60.164
change your netmaks to 192.168.60.0/255.255.255.0 (assuming that all your nodes ip addr are 192.168.60.x)
thanks,
.Erez- Proposed as answer by elantzMicrosoft employee Wednesday, July 30, 2008 5:21 PM
- Marked as answer by Don Pattee Monday, April 13, 2009 5:37 AM
Monday, July 28, 2008 11:07 PM -
I found out that not all nodes on our cluster have the Infiniband interconnects and this caused the problem. When I submit on the nodes which do have Infiniband installed everything runs OK.
Koen
Tuesday, July 29, 2008 9:05 AM -
Koen,
I work on high-performance networking in Microsoft's HPC team. Can you expand a bit on the configuration of your cluster? I don't see many heterogeneous cluster configurations (such as yours) in the "real world" and I'm interested in understanding more about your cluster and it's use.
I hope I'm not being too nosey with all these questions, but as much information as you'd like to share is gratefully appreciated.
1) Why do some nodes have IB and others don't- what were the business or technical needs that drove this configuration?
2) How was the cluster originally configured and now is it maintained? (the "in-the-box" HPC Server 2008 management tools are not much help on clusters with heterogeneous networks)
3) Have you considered using Job Templates and/or node groups in the HPCS2008 scheduler to automatically route jobs to the right nodes (with or without IB, for example)?
4) How many nodes does your cluster have and what applications are run most often?
Thanks,
Eric
(elantz@microsoft.com)
Eric Lantz (Microsoft)Tuesday, July 29, 2008 5:28 PM -
Hi Koen
I'm glad things are working ok. One thing to be aware of is that the version of Fluent you are running has known ( and pretty serious ) performance issues on windows, and you should see much better scalability on both the 6.3.33 versions or, ideally, on the 12.0.x versions just recently released.
If you are using winsock direct, you should also pass the -env MPICH_SOCKET_SBUFFER_SIZE 0 parameter to your mpi calls, as without this flag you will limit our bandwidth to around 250MB/sec maximum over that transport. With this fix in place, and the later versions of fluent i would expect scalability to be very good.
Of course if you have a chance to evaluate our V2 stack, and the new network direct provider, you'll again see very significant improvements at higher scales.
cheers
jeffSaturday, August 2, 2008 4:55 PM