mpi_alltoallv deadlock/error when used to re-distribute matrix
-
2010年4月6日 15:04
Could someone kindly help me to re-distribute a matrix with mpi_alltoallv or anyother functions in Intel Fortran? Thanks in advance!
Say I have a matrix A(I,J) with I=2 and J=78
currently, A is distributed on 3 nodes along I:
node 1: A(1,1:J)
node 2: A(1,1:J)
node 3: none
I want to distribute it to:
node 1: A(1:I, 1:26)
node 2: A(1:I, 1:26)
node 3: A(1:I, 1:26)
I used a function as this:
mpi_alltoallv(B0,np0,dp0,mpi_double_complex,B1,np1,dp1,mpi_double_complex,mpi_comm_world,pierr)
I checked it:
node1: np0=(26 26 26), dp0=(0 26 52), np1=(26 26 26), dp1=(0 0 0)
node2: np0=(26 26 26), dp0=(0 26 52), np1=(26 26 26), dp1=(26 26 26)
node3: np0=(0 0 0), dp0=(0 0 0), np1=(0 0 0), dp1=(52 52 52)
where B0((j-1)*I+i)=A(i,j)
However, the prorgag has a deadlock!!!!!!!!!!!!!!!!!!!!!!
================================================================
I also tested with other case:
I=5; J=78 with two nodes:
A distribution:
node1: A(1:3,1:78)
node2: A(1:2,1:78)
I want to re-distribute it to:
node1: A(1:5,1:39)
node1: A(1:5,1:39)
Again with B0((j-1)*I+i)=A(i,j) :
mpi_alltoallv(B0,np0,dp0,mpi_double_complex,B1,np1,dp1,mpi_double_complex,mpi_comm_world,pierr)
I checked it:
node1: np0=(117 117), dp0=(0 117), np1=(117 117), dp1=(0 0)
node2: np0=(78 78), dp0=(0 78), np1=(78 78), dp1=(117 117)
This time, it gave a error:
job aborted:
rank: node: exit code[: error message]
node0:
node1: 13: Fatal error in MPI_Alltoallv: Pending request (no error), error stack:
MPI_Alltoallv(576): MPI_Alltoallv(sbuf=0x000000000517C690, scnts=0x0000000002EECC20, sdispls=0x0000000002EECBC0, MPI_DOUBLE_COMPLEX, rbuf=0x
0000000004F42470, rcnts=0x0000000002EECB60, rdispls=0x0000000002EECB00, MPI_DOUBLE_COMPLEX, MPI_COMM_WORLD) failed
(unknown)(): Pending request (no error)
全部回复
-
2010年4月9日 0:26
Looking at the first question you showed, np1 is the recvcounts which specify the maximum number of elements that can be received from each processor. You set it as (0 0 0) for node3. Should it be changed to (26 26 0)? dp1 is the displacement in the recv buffer. For node3 you set it as (52 52 52). But there is no sending from node3. Could you check that all the parameters are properly set and then try again?
Thanks,
James
- 已标记为答案 Don PatteeModerator 2011年1月12日 2:51
-
2012年5月18日 17:20
Hi, i am having same error as you had before:
Fatal error in MPI_Alltoallv: Pending request (no error), error stack
Did you solve your problem? If so, and if you still remember, what was the problem in your case.