none
MPI_FINALIZE fails with error: rank 0 unable to connect to rank 1 RRS feed

  • Question

  • Hi all,

    I have MPI fortran code that I would like to run on two computers, one my home computer , the other work, connected through VPN. I run mpiexec with two hosts and I get the output from the program but MPI Finalize crashes with the above error message. job aborted:

    rank: node: exit code[: error message]
    0: <IP rank0>: 1: Fatal error in MPI_Finalize: Other MPI error, error stack:
    MPI_Finalize(307)............: MPI_Finalize failed
    MPI_Finalize(198)............:
    MPID_Finalize(92)............:
    PMPI_Barrier(476)............: MPI_Barrier(comm=0x44000002) failed
    MPIR_Barrier(82).............:
    MPIC_Sendrecv(158)...........:
    MPID_Isend(116)..............: failure occurred while attempting to send an eage
    r message
    MPIDI_CH3_iSend(175).........:
    MPIDI_CH3I_Sock_connect(1215): [ch3:sock] rank 0 unable to connect to rank 1 usi
    ng business card <port=2652 description=<rank 1's network here> ifname=<rank 1s IP>>
    MPIDU_Sock_post_connect(1231): unable to connect to ... on p
    ort 2652, exhausted all endpoints (errno -1)
    MPIDU_Sock_post_connect(1247): gethostbyname failed, The requested name is valid
     and was found in the database, but it does not have the correct associated data
     being resolved for. (errno 11004)
    1: ...: 1

    I set MPICH_NETMASK to the correct IP but it doesnt seem to make a difference.

    Does anyone have a hint for me? Let me know if you need more information...

    Benjamin
    Saturday, September 12, 2009 9:44 PM

Answers

  • Hi Benjamin,

    What version of MPI are you using? (MSMPI v1, v2? MPICH2??)

    seems that your name resolution is failing; try using the ip address of your machines on the mpiexec command line, instead of the hosts names.

    .Erez

    Monday, September 14, 2009 10:24 PM