locked
MPI over Infiniband (parallel Fluent on Windows Compute Cluster 2003) RRS feed

  • Question

  • Hi,

    I'm running parallel Fluent jobs on Windows Compute Cluster 2003. I am able to run jobs using the Gigabit Ethernet without problems although the scalability / efficiency is not as good as I expected. The cluster also has Voltaire Infiniband installed so I hoped that switching to this interconnect would improve the performance.

    First I ran the vstat command to check that the Infiniband is up and working.

    Then I ran ipconfig to check the connection details of the Infiniband.

    Next I submitted a job with the MPICH_NETMASK variable set to 192.168.160.0/255.255.255.0 so the MPI communication will go over the fast connections. Unfortunately my job crashed and gave the following output. Could someone see in the output what goes wrong?

    Thanks in advance,

    Koen

    Loading "\spitfireb-hn\Fluent.Inc.6326\fluent6.3.26\lib\fl114-64.dmp"
    Done.
    Warning: -path flag not specified.
    Defaulting to -path\\spitfireb-hn\Fluent.Inc.6326


         Welcome to Fluent 6.3.26

         Copyright 2006 Fluent Inc.
         All Rights Reserved

    Loading "\\spitfireb-hn\Fluent.Inc.6326\fluent6.3.26\lib\flprim1119-64.dmp"
    Done.
     
    Host spawning Node 0 on machine "sb-node027" (win64).

    You can click CTRL+C to stop the startup process!

    Using user defined:  -env MPICH_NETMASK 192.168.160.0/255.255.255.0

    job aborted:
    rank: node: exit code: message
    0: SB-NODE027: terminated
    1: SB-NODE027: terminated
    2: SB-NODE027: terminated
    3: SB-NODE027: terminated
    4: SB-NODE028: terminated
    5: SB-NODE028: terminated
    6: SB-NODE028: fatal error: Fatal error in MPI_Barrier: Other MPI error, error stack:
    MPI_Barrier(406)....................: MPI_Barrier(MPI_COMM_WORLD) failed
    MPIR_Barrier(76)....................:
    MPIC_Sendrecv(142)..................:
    MPID_Isend(97)......................: failure occurred while attempting to send an eager message
    MPIDI_CH3_iSendv_internal(242)......:
    MPIDI_CH3I_Sock_connect(381)........: [ch3:sock] rank 6 unable to connect to rank 8 using business card <port=4829 description="152.78.60.167 192.168.60.164 sb-node025 " shm_host=sb-node025 shm_queue=E939FC22-5262-44ef-A5F5-745EBFA2AC88 >
    MPIDU_Sock_post_connect_filter(1258): unable to connect to 152.78.60.167 192.168.60.164 sb-node025  on port 4829, no endpoint matches the netmask 192.168.160.0/255.255.255.0
    7: SB-NODE028: fatal error: Fatal error in MPI_Barrier: Other MPI error, error stack:
    MPI_Barrier(406)....................: MPI_Barrier(MPI_COMM_WORLD) failed
    MPIR_Barrier(76)....................:
    MPIC_Sendrecv(142)..................:
    MPID_Isend(97)......................: failure occurred while attempting to send an eager message
    MPIDI_CH3_iSendv_internal(242)......:
    MPIDI_CH3I_Sock_connect(381)........: [ch3:sock] rank 7 unable to connect to rank 8 using business card <port=4829 description="152.78.60.167 192.168.60.164 sb-node025 " shm_host=sb-node025 shm_queue=E939FC22-5262-44ef-A5F5-745EBFA2AC88 >
    MPIDU_Sock_post_connect_filter(1258): unable to connect to 152.78.60.167 192.168.60.164 sb-node025  on port 4829, no endpoint matches the netmask 192.168.160.0/255.255.255.0
    8: SB-NODE025: terminated
    9: SB-NODE025: terminated
    10: SB-NODE025: terminated
    11: SB-NODE025: terminated
    12: SB-NODE026: terminated
    13: SB-NODE026: terminated
    14: SB-NODE026: terminated
    15: SB-NODE026: terminated
    16: SB-NODE024: terminated
    17: SB-NODE024: terminated
    18: SB-NODE024: terminated
    19: SB-NODE024: terminated
    20: SB-NODE029: terminated
    21: SB-NODE029: terminated
    22: SB-NODE029: terminated
    23: SB-NODE029: terminated

    ---- error analysis -----

    6: mpi has detected a fatal error and aborted \\spitfireb-hn\Fluent.Inc.6326\fluent6.3.26\win64\3ddp_node\fl_mpi6326.exe run on SB-NODE028
    7: mpi has detected a fatal error and aborted \\spitfireb-hn\Fluent.Inc.6326\fluent6.3.26\win64\3ddp_node\fl_mpi6326.exe run on SB-NODE028

    ---- error analysis -----

    Monday, July 28, 2008 5:06 PM

Answers

  • Hi K.J,

    seems like your netmask does not match your network; see,

         no endpoint matches the netmask 192.168.160.0/255.255.255.0

    your network is 192.168.60.164

    change your netmaks to 192.168.60.0/255.255.255.0 (assuming that all your nodes ip addr are 192.168.60.x)

    thanks,
    .Erez
    Monday, July 28, 2008 11:07 PM

All replies

  • Hi K.J,

    seems like your netmask does not match your network; see,

         no endpoint matches the netmask 192.168.160.0/255.255.255.0

    your network is 192.168.60.164

    change your netmaks to 192.168.60.0/255.255.255.0 (assuming that all your nodes ip addr are 192.168.60.x)

    thanks,
    .Erez
    Monday, July 28, 2008 11:07 PM
  • I found out that not all nodes on our cluster have the Infiniband interconnects and this caused the problem. When I submit on the nodes which do have Infiniband installed everything runs OK.

    Koen
    Tuesday, July 29, 2008 9:05 AM
  • Koen,
    I work on high-performance networking in Microsoft's HPC team.  Can you expand a bit on the configuration of your cluster?  I don't see many heterogeneous cluster configurations (such as yours) in the "real world" and I'm interested in understanding more about your cluster and it's use. 

    I hope I'm not being too nosey with all these questions, but as much information as you'd like to share is gratefully appreciated. 

    1) Why do some nodes have IB and others don't- what were the business or technical needs that drove this configuration?
    2) How was the cluster originally configured and now is it maintained?   (the "in-the-box" HPC Server 2008 management tools are not much help on clusters with heterogeneous networks)
    3) Have you considered using Job Templates and/or node groups in the HPCS2008 scheduler to automatically route jobs to the right nodes (with or without IB, for example)? 
    4) How many nodes does your cluster have and what applications are run most often? 

    Thanks,
    Eric
    (elantz@microsoft.com)
    Eric Lantz (Microsoft)
    Tuesday, July 29, 2008 5:28 PM
  • Hi Koen

    I'm glad things are working ok. One thing to be aware of is that the version of Fluent you are running has known ( and pretty serious ) performance issues on windows, and you should see much better scalability on both the 6.3.33 versions or, ideally, on the 12.0.x versions just recently released.

    If you are using winsock direct, you should also pass the -env MPICH_SOCKET_SBUFFER_SIZE 0 parameter to your mpi calls, as without this flag you will limit our bandwidth to around 250MB/sec maximum over that transport. With this fix in place, and the later versions of fluent i would expect scalability to be very good.

    Of course if you have a chance to evaluate our V2 stack, and the new network direct provider, you'll again see very significant improvements at higher scales.

    cheers
    jeff


    Saturday, August 2, 2008 4:55 PM