Ask a questionAsk a question
 

QuestionMPI problem

  • Friday, November 06, 2009 6:10 AMyyalli Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi. I have got an MPI job which makes the following error. I have no problem if I use one node with multiple processes but if I use multiple nodes, I got always the following error. My HPC clusters are using Infiniband. Is there any option I should use for this type of network? I don't know where I should look into.

    job aborted:
    [ranks] message

    [0] fatal error
    Fatal error in MPI_Scatterv: Other MPI error, error stack:
    MPI_Scatterv(358).......................: MPI_Scatterv(sbuf=0x06F47100, scnts=0x02138718, displs=0x0
    1E21180, MPI_DOUBLE, rbuf=0x01C19660, rcount=1, MPI_DOUBLE, root=0, comm=0x84000001) failed
    MPIR_Scatterv(119)......................:
    MPIC_Send(39)...........................:
    MPIC_Wait(277)..........................:
    CH3_ND::CCq::Poll(136)..................:
    CH3_ND::CEndpoint::RecvSucceeded(1476)..:
    CH3_ND::CEndpoint::ProcessReceives(1120):
    CH3_ND::CEndpoint::ProcessDataMsg(1281).:
    MPIDI_CH3_RndvSend(271).................: failure occurred while attempting to send message data
    CH3_ND::CEndpoint::ProcessSends(869)....:
    CH3_ND::CEnvironment::CreateMr(490).....:
    CH3_ND::CMr::Create(91).................:
    CH3_ND::CMr::Init(66)...................:
    CH3_ND::CAdapter::RegisterMemory(293)...: [ch3:nd] INDAdapter::RegisterMemory failed with 0xc0000001


    [1-4] terminated

    Any advice or help will be greatly appreciated. 

    Thanks,
    Jong