none
Different MPI Connection issue RRS feed

  • Question

  • Running with MS MPI on server 2003 I get an issue when trying to send and receive across nodes (boxes). This seems to be a different message than the other connection issue on this forum. Any pointers appreciated.
     

    job aborted:
    rank: node: exit code: message
    0: HPC-29: fatal error: Fatal error in MPI_Send: Other MPI error, error stack:
    MPI_Send(172).......................: MPI_Send(buf=0x000000000012ED40, count=168, MPI_CHAR, dest=8, tag=0, MPI_COMM_WORLD) failed
    MPID_Send(146)......................: failure occurred while attempting to send an eager message
    MPIDI_CH3_iStartMsgv_internal(262)..:
    MPIDI_CH3I_Sock_connect(381)........: [ch3:sock] rank 0 unable to connect to rank 8 using business card <port=1241 description="169.10.16.48 192.0.2.128 169.254.90.25 hpc-28 " shm_host=hpc-28 shm_queue=398253CB-31B7-48af-A60D-951F99CE36A8 >
    MPIDU_Sock_post_connect_filter(1258): unable to connect to 169.10.16.48 192.0.2.128 169.254.90.25 hpc-28  on port 1241, no endpoint matches the netmask 192.168.0.0/255.255.0.0
    1: HPC-29: terminated
    2: HPC-29: terminated
    3: HPC-29: terminated
    4: HPC-29: terminated
    5: HPC-29: terminated
    6: HPC-29: terminated
    7: HPC-29: terminated
    8: HPC-28: terminated
    9: HPC-28: terminated
    10: HPC-28: terminated
    11: HPC-28: terminated
    12: HPC-28: terminated
    13: HPC-28: terminated
    14: HPC-28: terminated
    15: HPC-28: terminated

    Wednesday, October 29, 2008 10:00 PM

Answers

All replies