Hi. I have got an MPI job which makes the following error. I have no problem if I use one node with multiple processes but if I use multiple nodes, I got always the following error. My HPC clusters are using Infiniband. Is there any option I should use for this type of network? I don't know where I should look into.
job aborted:
[ranks] message
[0] fatal error
Fatal error in MPI_Scatterv: Other MPI error, error stack:
MPI_Scatterv(358).......................: MPI_Scatterv(sbuf=0x06F47100, scnts=0x02138718, displs=0x0
1E21180, MPI_DOUBLE, rbuf=0x01C19660, rcount=1, MPI_DOUBLE, root=0, comm=0x84000001) failed
MPIR_Scatterv(119)......................:
MPIC_Send(39)...........................:
MPIC_Wait(277)..........................:
CH3_ND::CCq::Poll(136)..................:
CH3_ND::CEndpoint::RecvSucceeded(1476)..:
CH3_ND::CEndpoint::ProcessReceives(1120):
CH3_ND::CEndpoint::ProcessDataMsg(1281).:
MPIDI_CH3_RndvSend(271).................: failure occurred while attempting to send message data
CH3_ND::CEndpoint::ProcessSends(869)....:
CH3_ND::CEnvironment::CreateMr(490).....:
CH3_ND::CMr::Create(91).................:
CH3_ND::CMr::Init(66)...................:
CH3_ND::CAdapter::RegisterMemory(293)...: [ch3:nd] INDAdapter::RegisterMemory failed with 0xc0000001
[1-4] terminated
Any advice or help will be greatly appreciated.
Thanks,
Jong