none
MSMPI Buffer size issue ( Message truncated ) RRS feed

  • Question

  • Hello

    While executing my application using 2 processes (mpiexec -np 2 ****.exe), I am getting the following error:

    -------------------------------------------

     

    [0] fatal error

    Fatal error in MPI_Reduce: Message truncated, error stack:

    MPI_Reduce(804).....................: MPI_Reduce(sbuf=0x0000000141121C00, rbuf=0

    x000000000012F978, count=1, MPI_DOUBLE_PRECISION, MPI_MAX, root=0, MPI_COMM_WORL

    D) failed

    MPIR_Reduce(461)....................:

    MPIC_Recv(72).......................:

    MPIDI_CH3U_Request_unpack_uebuf(599): Message truncated; 64 bytes received but buffer size is 8

    ---------------------------------------------

     

     

    It works fine with just 1 process. I am using PGI compilers to compile the code (in cygwin environment on Windows platform) and adding the MSMPI library through the -Mmpi=msmpi flag.

    I suppose the msmpi.dll being used is version 2.0.1551.0 and the OS is Windows Server 2008, HPC Edition Service Pack 2

     

    Kindly let me know if you have any suggestions to resolve this.

    Thanks & Regards,

    Kunal

    Tuesday, August 10, 2010 7:01 PM

Answers

  • Hi Kunal,

    I doubt there is size mismatch for Mpi_Reduce as the error said. Could you examine your code where calling MPI_Reduce(). My suggestion is that you can include some print out code before MPI_Reduce to make sure that the receiving buffer size is the same as the sending buffer size. Also you can consider whether it is possible to simplify your code to help narrow down what's wrong.

    Thanks,

    James

    Thursday, August 12, 2010 10:30 PM

All replies

  • Hello Kunal,

    Do you mind copy your code here? It is hard to tell what's wrong just based on the error message.

    Thanks,

    James

    Thursday, August 12, 2010 1:12 AM
  • Hi James,

      Thanks for your reply. I would have copied the code, but it is very huge application and I am not sure which part of the code

      is causing the problem. When I run the job through the HPC Cluster manager, it fails with the error message:

      -----------------------

       Task failed during execution with exit code -4. Please check task's output for error details.

      ----------------------

      And the output file at the end has:

      ---------------------

      job aborted:

    [ranks] message

    [0] fatal error

    Fatal error in MPI_Reduce: Message truncated, error stack:

    MPI_Reduce(804).....................: MPI_Reduce(sbuf=0x0000000141121C00, rbuf=0x000000000012F978, count=1, MPI_DOUBLE_PRECISION, MPI_MAX, root=0, MPI_COMM_WORLD) failed

    MPIR_Reduce(461)....................: 

    MPIC_Recv(72).......................: 

    MPIDI_CH3U_Request_unpack_uebuf(599): Message truncated; 64 bytes received but buffer size is 8

    [1] terminated

    ---- error analysis -----

    [0] on COMPUTE-NODE-2

    mpi has detected a fatal error and aborted flash3.exe

    ---- error analysis -----

    ------------------------------------

     

    Can that exit code -4 give some hint into the problem ?

    Thanks & Regards,

    Kunal

    P.S. I am executing the job on a virtual cluster. Can the virtual environment be the cause ?

    Thursday, August 12, 2010 3:30 AM
  • Hi Kunal,

    I doubt there is size mismatch for Mpi_Reduce as the error said. Could you examine your code where calling MPI_Reduce(). My suggestion is that you can include some print out code before MPI_Reduce to make sure that the receiving buffer size is the same as the sending buffer size. Also you can consider whether it is possible to simplify your code to help narrow down what's wrong.

    Thanks,

    James

    Thursday, August 12, 2010 10:30 PM