none
mpi_alltoallv deadlock/error when used to re-distribute matrix RRS feed

  • Question


  • Could someone kindly help me to re-distribute a matrix with mpi_alltoallv or anyother functions in Intel Fortran? Thanks in advance!

    Say I have a matrix A(I,J) with I=2 and J=78

    currently, A is distributed on 3 nodes along I:

    node 1:  A(1,1:J)
    node 2:  A(1,1:J)
    node 3: none

    I want to distribute it to:

    node 1: A(1:I, 1:26)
    node 2: A(1:I, 1:26)
    node 3: A(1:I, 1:26)


    I used a function as this:

    mpi_alltoallv(B0,np0,dp0,mpi_double_complex,B1,np1,dp1,mpi_double_complex,mpi_comm_world,pierr)

    I checked it:
    node1: np0=(26 26 26), dp0=(0 26 52), np1=(26 26 26), dp1=(0 0 0)
    node2: np0=(26 26 26), dp0=(0 26 52), np1=(26 26 26), dp1=(26 26 26)
    node3: np0=(0 0 0),      dp0=(0 0 0),    np1=(0 0 0), dp1=(52 52 52)

    where B0((j-1)*I+i)=A(i,j)

    However, the prorgag has a deadlock!!!!!!!!!!!!!!!!!!!!!!

    ================================================================

    I also tested with other case:
    I=5; J=78 with two nodes:

    A distribution:
    node1: A(1:3,1:78)
    node2: A(1:2,1:78)

    I want to re-distribute it to:
    node1: A(1:5,1:39)
    node1: A(1:5,1:39)

    Again with B0((j-1)*I+i)=A(i,j) :

    mpi_alltoallv(B0,np0,dp0,mpi_double_complex,B1,np1,dp1,mpi_double_complex,mpi_comm_world,pierr)

    I checked it:
    node1: np0=(117 117), dp0=(0 117), np1=(117 117), dp1=(0 0)
    node2: np0=(78 78), dp0=(0 78), np1=(78 78), dp1=(117 117)

    This time, it gave a error:

    job aborted:
    rank: node: exit code[: error message]
    node0:
    node1: 13: Fatal error in MPI_Alltoallv: Pending request (no error), error stack:
    MPI_Alltoallv(576): MPI_Alltoallv(sbuf=0x000000000517C690, scnts=0x0000000002EECC20, sdispls=0x0000000002EECBC0, MPI_DOUBLE_COMPLEX, rbuf=0x
    0000000004F42470, rcnts=0x0000000002EECB60, rdispls=0x0000000002EECB00, MPI_DOUBLE_COMPLEX, MPI_COMM_WORLD) failed
    (unknown)(): Pending request (no error)


    Tuesday, April 6, 2010 3:04 PM

Answers

  • Looking at the first question you showed, np1 is the recvcounts which specify the maximum number of elements that can be received from each processor. You set it as (0 0 0) for node3. Should it be changed to (26 26 0)? dp1 is the displacement in the recv buffer. For node3 you set it as (52 52 52). But there is no sending from node3. Could you check that all the parameters are properly set and then try again?

    Thanks,

    James

    Friday, April 9, 2010 12:26 AM

All replies

  • Looking at the first question you showed, np1 is the recvcounts which specify the maximum number of elements that can be received from each processor. You set it as (0 0 0) for node3. Should it be changed to (26 26 0)? dp1 is the displacement in the recv buffer. For node3 you set it as (52 52 52). But there is no sending from node3. Could you check that all the parameters are properly set and then try again?

    Thanks,

    James

    Friday, April 9, 2010 12:26 AM
  • Hi, i am having same error as you had before:

     Fatal error in MPI_Alltoallv: Pending request (no error), error stack

    Did you solve your problem? If so, and if you still remember, what was the problem in your case.

    Friday, May 18, 2012 5:20 PM