none
MPI program on HPC Server RRS feed

  • Question

  • I compile program by VC++ 2008 and to run the program on Windows HPC Server 2008.
    I get a error message from HPC Server .
    My PC group have 5 PC , every PC OS are Windows HPC Server 2008.

    That roles :
    HeadNode & ComputerNode : USER
    else PC ComputerNode : SERVER1 、SERVER2、SERVER3、SERVER4

    when I  command to run the mpi program on USER , I get a error message

    ========↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓========

    job aborted:
    [ranks] message

    [0] fatal error
    Fatal error in MPI_Scatter: Invalid communicator, error stack:
    MPI_Scatter(762): MPI_Scatter(sbuf=0x000000000031F7F0, scount=4, MPI_DOUBLE, rbuf=0x000000000031F804, rcount=4, MPI_DOUBLE, root=0, comm=0xcccccccc) failed
    MPI_Scatter(635): Invalid communicator

    [1-3] terminated

    ---- error analysis -----

    [0] on SERVER1
    mpi has detected a fatal error and aborted mpi_sar.exe

    ---- error analysis -----

    =========↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑=====================
    http://img14.imageshack.us/img14/1370/hnode.jpg

    But when I  command to run the mpi program on SERVER1 (or SERVER2...SERVER4)

     The task results  error message :
    The failed during execution with exit code 0. Please check task's output for error detais.
    http://img14.imageshack.us/img14/7212/cnode.jpg

    my code  download :
    http://www.xun6.com/file/8abc7bfc8/MPI_SAR.cpp.html

    the error message download :
    http://www.xun6.com/file/dbbf23e38/error.txt.html

    How can I solve the problem !?

    Wednesday, January 27, 2010 9:48 AM

Answers

  • Hi YuJinSu,

    some suggestions:

    1) how to do debug.
       After you compile and build your MPI program, before you run it in HPC scheduler, run it in cmd line as:
                  mpiexec -n 1 MPI_SAR.exe
                  mpiexec -n X MPI_SAR.exe
      After the cmdline pass, then use hpc scheduler to run it on multiple nodes.

    2) some code error (but not all)
        I checked your MPI code. There are several errors. for example,
             - 'comm' variable is declared at line 13 but never initialized before it was used  in MPI_Scatter().
             - phi[] array is integer array. in MPI_Scatter(), you are send/recv as MPI_DOUBLE.
             - there are more errors. please use what 1) to debug it.

    Hope this helps,

    Liwei
    • Marked as answer by YuJinSu Saturday, January 30, 2010 8:43 AM
    Wednesday, January 27, 2010 5:59 PM
  • The problem is caused in the for loop of the send:

    for ( i = 0; i <=360; i++)
    {
        phi = i -
    180;
        j = i%
    4;
        MPI_Send ((
    void *)&phi, 1, MPI_INT, j, itag, MPI_COMM_WORLD);
    }

    The destination of the MPI_Send is specified as j, which will range from 0 to 3. You can't send to process 0 as it is itself. That will cause deadlock.

    Another problem with the code is that you should use numprocs as the number of processes of your comm. If you use hard coded value, it will fail if you specify the number of proc other than the hard coded number.

    Thanks,
    James 

    • Marked as answer by YuJinSu Saturday, January 30, 2010 8:42 AM
    Friday, January 29, 2010 8:03 PM

All replies

  • Hi YuJinSu,

    some suggestions:

    1) how to do debug.
       After you compile and build your MPI program, before you run it in HPC scheduler, run it in cmd line as:
                  mpiexec -n 1 MPI_SAR.exe
                  mpiexec -n X MPI_SAR.exe
      After the cmdline pass, then use hpc scheduler to run it on multiple nodes.

    2) some code error (but not all)
        I checked your MPI code. There are several errors. for example,
             - 'comm' variable is declared at line 13 but never initialized before it was used  in MPI_Scatter().
             - phi[] array is integer array. in MPI_Scatter(), you are send/recv as MPI_DOUBLE.
             - there are more errors. please use what 1) to debug it.

    Hope this helps,

    Liwei
    • Marked as answer by YuJinSu Saturday, January 30, 2010 8:43 AM
    Wednesday, January 27, 2010 5:59 PM
  • I have quick look of your source code. The biggest problem is that the array phi is out of the bound:

    for

     

     

    ( i = 0; i < 360; i++) ;

    {

    phi[i] = i -

     

    180;

    }

    You accidently add ";" in the end of the for loop. After the for loop, the value of i would be 360. Then you try to access the phi[360], which will get the runtime stack error.

    Thanks,
    James

    Wednesday, January 27, 2010 7:33 PM
  • Thanks for Liwei Peng MSFT  & James Ren suggest.

    I will change  my mpi code,the problem can be cancel,but I have new problem .

    my code  download :
    http://www.xun6.com/file/cd8d16411/MPISAR.cpp.html

    the error message both on cmdline or on HPC Server :

    job aborted:
    [ranks] message

    [0] fatal error
    Fatal error in MPI_Send: Other MPI error, error stack:
    MPI_Send(175): MPI_Send(buf=0x00000000001BFC84, count=1, MPI_INT, dest=0, tag=10, MPI_COMM_WORLD) failed
    MPID_Send(53): DEADLOCK: attempting to send a message to the local process without a prior matching receive

    [1-3] terminated

    ---- error analysis -----

    [0] on SERVER1
    mpi has detected a fatal error and aborted mpisar.exe

    ---- error analysis -----


    How can I solve the problem !?

    Friday, January 29, 2010 9:45 AM
  • The problem is caused in the for loop of the send:

    for ( i = 0; i <=360; i++)
    {
        phi = i -
    180;
        j = i%
    4;
        MPI_Send ((
    void *)&phi, 1, MPI_INT, j, itag, MPI_COMM_WORLD);
    }

    The destination of the MPI_Send is specified as j, which will range from 0 to 3. You can't send to process 0 as it is itself. That will cause deadlock.

    Another problem with the code is that you should use numprocs as the number of processes of your comm. If you use hard coded value, it will fail if you specify the number of proc other than the hard coded number.

    Thanks,
    James 

    • Marked as answer by YuJinSu Saturday, January 30, 2010 8:42 AM
    Friday, January 29, 2010 8:03 PM
  • The problem is caused in the for loop of the send:

    for ( i = 0; i <=360; i++)
    {
        phi = i -
    180;
        j = i%
    4;
        MPI_Send ((
    void *)&phi, 1, MPI_INT, j, itag, MPI_COMM_WORLD);
    }

    The destination of the MPI_Send is specified as j, which will range from 0 to 3. You can't send to process 0 as it is itself. That will cause deadlock.

    Another problem with the code is that you should use numprocs as the number of processes of your comm. If you use hard coded value, it will fail if you specify the number of proc other than the hard coded number.

    Thanks,
    James 

    thank you!!

    The problem have been slove!!
    Saturday, January 30, 2010 8:42 AM