mpi_alltoallv deadlock/error when used to re-distribute matrix
-
6 aprilie 2010 15:04
Could someone kindly help me to re-distribute a matrix with mpi_alltoallv or anyother functions in Intel Fortran? Thanks in advance!
Say I have a matrix A(I,J) with I=2 and J=78
currently, A is distributed on 3 nodes along I:
node 1: A(1,1:J)
node 2: A(1,1:J)
node 3: none
I want to distribute it to:
node 1: A(1:I, 1:26)
node 2: A(1:I, 1:26)
node 3: A(1:I, 1:26)
I used a function as this:
mpi_alltoallv(B0,np0,dp0,mpi_double_complex,B1,np1,dp1,mpi_double_complex,mpi_comm_world,pierr)
I checked it:
node1: np0=(26 26 26), dp0=(0 26 52), np1=(26 26 26), dp1=(0 0 0)
node2: np0=(26 26 26), dp0=(0 26 52), np1=(26 26 26), dp1=(26 26 26)
node3: np0=(0 0 0), dp0=(0 0 0), np1=(0 0 0), dp1=(52 52 52)
where B0((j-1)*I+i)=A(i,j)
However, the prorgag has a deadlock!!!!!!!!!!!!!!!!!!!!!!
================================================================
I also tested with other case:
I=5; J=78 with two nodes:
A distribution:
node1: A(1:3,1:78)
node2: A(1:2,1:78)
I want to re-distribute it to:
node1: A(1:5,1:39)
node1: A(1:5,1:39)
Again with B0((j-1)*I+i)=A(i,j) :
mpi_alltoallv(B0,np0,dp0,mpi_double_complex,B1,np1,dp1,mpi_double_complex,mpi_comm_world,pierr)
I checked it:
node1: np0=(117 117), dp0=(0 117), np1=(117 117), dp1=(0 0)
node2: np0=(78 78), dp0=(0 78), np1=(78 78), dp1=(117 117)
This time, it gave a error:
job aborted:
rank: node: exit code[: error message]
node0:
node1: 13: Fatal error in MPI_Alltoallv: Pending request (no error), error stack:
MPI_Alltoallv(576): MPI_Alltoallv(sbuf=0x000000000517C690, scnts=0x0000000002EECC20, sdispls=0x0000000002EECBC0, MPI_DOUBLE_COMPLEX, rbuf=0x
0000000004F42470, rcnts=0x0000000002EECB60, rdispls=0x0000000002EECB00, MPI_DOUBLE_COMPLEX, MPI_COMM_WORLD) failed
(unknown)(): Pending request (no error)
Toate mesajele
-
9 aprilie 2010 00:26
Looking at the first question you showed, np1 is the recvcounts which specify the maximum number of elements that can be received from each processor. You set it as (0 0 0) for node3. Should it be changed to (26 26 0)? dp1 is the displacement in the recv buffer. For node3 you set it as (52 52 52). But there is no sending from node3. Could you check that all the parameters are properly set and then try again?
Thanks,
James
- Marcat ca răspuns de Don PatteeModerator 12 ianuarie 2011 02:51
-
18 mai 2012 17:20
Hi, i am having same error as you had before:
Fatal error in MPI_Alltoallv: Pending request (no error), error stack
Did you solve your problem? If so, and if you still remember, what was the problem in your case.