locked
MPI_Recv - fatal error - received packet of unknown type (1234) RRS feed

  • Question

  • Hello!

    Windows 7, two machines (BIGBROTHER and LITTLESISTER), home local network, HPC Pack 2012 MS-MPI Redistributable Package with Service Pack 1.

    When I try to use MPI_Send (on BIGBROTHER) and MPI_Recv (on LITTLESISTER) my program return error:

    job aborted:

    [ranks] message

    [0] terminated

    [1] fatal error

    Fatal error in PMPI_Recv: Internal MPI error!, error stack:

    MPI_Recv(buf=0x0042FBDC, count=64, MPI_CHAR, src=0, tag=0, MPI_COMM_WORLD, status=0x0042FB78) failed

    [ch3:sock] received packet of unknown type (1234)

    ---------------------------------------------------------

    My code here:

    http://www.sourcepod.com/psznnl57-21098

    -------------------------------------------------------------------------------

    I tried to google error massage, but there is nothing. There is the very strange error. And MPI_Send work fine, why MPI_Recv return fatal error? 

    • Edited by ivanov.jaroslaw Monday, December 9, 2013 3:39 PM don't find in web
    Sunday, December 8, 2013 10:32 AM

All replies

  • Hi Ivanov,

    Does the code work locally on a single machine? (i.e., on either BIGBROTHER or LITTLESISTER: mpiexec -n 2 mpi_hello.exe? If this works locally and not when you use the two hosts, it's possible they're using mixed version of msmpi (or different MPI). Dependency Walker will be able to tell you which MPI mpi_hello is linked against

    I just tried your code both locally and on two different hosts (mpiexec -hosts 2 host1 1 host2 1 mpi_hello.exe) and it works fine for me.

    Can you tell me how you launched the job and the output of the following:

    1) where mpiexec.exe

    2) where msmpi.dll

    Anh

    Thanks

    Monday, December 9, 2013 4:09 PM
  • Yes, the code work locally on a single machine (either BIGBROTHER and LITTLESISTER):

    runas /user:BIGBROTHER\cluster "cmd /K mpiexec -n 2 \\BIGBROTHER\MPI\mpi_comm.exe"

    runas /user:LITTLESISTER\cluster "cmd /K mpiexec -n 2 \\BIGBROTHER\MPI\mpi_comm.exe"

    But when I try to launch mpi_comm.exe on both machines, there is the error: Fatal error in PMPI_Recv: Internal MPI error!, error stack: ...

    on BIGBROTHER machine (smpd works on either):

    runas /user:BIGBROTHER\cluster "cmd /K mpiexec -hosts 2 BIGBROTHER 1 LITTLESISTER 1 \\BIGBROTHER\MPI\mpi_comm.exe"

    -------------------------------------------------

    mpi_hello works fine: 

    #include <stdio.h> 
    #include "mpi.h" 
    int main(int argc, char* argv[]) 
    { 
       int procs_rank, procs_count; 
       MPI_Init(&argc, &argv); 
       MPI_Comm_size(MPI_COMM_WORLD, &procs_count); 
       MPI_Comm_rank(MPI_COMM_WORLD, &procs_rank); 
       printf ("\n Hello, World from process %d of %d", procs_rank, procs_count); 
       MPI_Finalize(); 
       return 0; 
    }

    but mpi_comm don't work (when I use MPI_Send and MPI_Recv):

    int main(int argc, char *argv[])
    {
    int rank, nproc, name_len;
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    double start_time, end_time;
    char send_buf[64], recv_buf[64];
    MPI_Status st;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &nproc);
    MPI_Get_processor_name(processor_name, &name_len);
    start_time = MPI_Wtime();
     
    switch (rank)
    {
    case 0:
    sprintf(send_buf, "Hello from process 0");
    MPI_Send(send_buf, 64, MPI_CHAR, 1, 0, MPI_COMM_WORLD);
    break;
    default:
    MPI_Recv(send_buf, 64, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &st);
    printf("Process %d received %s\n", rank, send_buf);
    }
     
    end_time = MPI_Wtime();
    printf("Work time %f sec\n", end_time - start_time);
     
    MPI_Finalize();
    return 0;
    }


    Monday, December 9, 2013 4:26 PM
  • Hi Ivanov,

    I did try mpi_comm (I just didn't know what you named it so I just assumed it was still named mpi_hello) and it worked for me. Can you send me the following output:

    runas /user:BIGBROTHER\cluster  "cmd /K where mpiexec.exe && where msmpi.dll"

    Monday, December 9, 2013 8:02 PM
  • Can user BIGBROTHER run programs on LITTLESISTER machine?I doubt.

    /user:BIGBROTHER\cluster  -> means local user credentials on  BIGBROTHER machine but not on LITTLESISTER

    I suggest to create account with the same name and password  on your both machines or create domain environment and run mpiexec with domain user credentials.


    Daniel Drypczewski

    Tuesday, December 10, 2013 3:50 AM
  • And do the LITTLESISTER machine has granted access to shared folder  \\BIGBROTHER\MPI ?

    Daniel Drypczewski

    Tuesday, December 10, 2013 3:53 AM
  • Hi Anh, I decided that I should printscreen how my programs works:

    1) http://postimg.org/gallery/18zlnl5vm/39a082ad/

    So, codes of mpi_hello and mpi_comm places in my last message. mpi_hello works fine on BIGBROTHER, LITTLESISTER and both together (when I run on BIGBROTHER, and when I run on LITTLESISTER). mpi_comm works fine on BIGBROTHER, LITTLESISTER, but NOT both together (MPI_Recv error).

    In the pictures I see some strange thing about mpi_comm. Process 0 should send message to process 1. Process 1 should print, that it get message. So, Process 0 should start earlier, yes? But in printsreen I see, that earlier run process 1, next process 0. Very-very strange, I think.

    Do you understand, that I run 2 programs? mpi_hello and mpi_comm. Do mpi_comm works on your machines?

    2) About mpiexec.exe and msmpi.dll.

    C:\Program Files\Microsoft HPC Pack 2012\Bin\mpieec.exe

    C:\Windows\Sysrem32\msmpi.dll

    But why you ask me about that? If mpi_hello works fine.

    Saturday, December 14, 2013 4:42 PM
  • Hi Daniel,

    machine BIGBROTHER run programs on LITTLESISTER machine, you can see it on printscreens (http://postimg.org/gallery/18zlnl5vm/39a082ad/).

    I start mpiexec on BIGBROTHER machine as cluster/cluster and I can set to mpiexec run on LITTLESISTER machine (and conversely)

    I create a user "cluster" with password "cluster" on both machines. Maybe I don't should run mpiexec through RUNAS as BIGBROTHER\cluster or LITTLESISTER\cluster? When I start mpi_hello programs it works fine. The error appears when I start mpi_comm program:

    int main(int argc, char *argv[])
    {
    int rank, nproc, name_len;
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    double start_time, end_time;
    char send_buf[64], recv_buf[64];
    MPI_Status st;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &nproc);
    MPI_Get_processor_name(processor_name, &name_len);
    start_time = MPI_Wtime();
     
    switch (rank)
    {
    case 0:
    sprintf(send_buf, "Hello from process 0");
    MPI_Send(send_buf, 64, MPI_CHAR, 1, 0, MPI_COMM_WORLD);
    break;
    default:
    MPI_Recv(send_buf, 64, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &st);
    printf("Process %d received %s\n", rank, send_buf);
    }
     
    end_time = MPI_Wtime();
    printf("Work time %f sec\n", end_time - start_time);
     
    MPI_Finalize();
    return 0;
    }

    Saturday, December 14, 2013 4:48 PM
  • Yes, mpi_hello works fine on bothmachines

    http://postimg.org/image/rf7poxkap/ - start

    http://postimg.org/image/6m0pms2dd/ - work fine

    Saturday, December 14, 2013 4:50 PM
  • Hi Anh, I decided that I should printscreen how my programs works:

    1) http://postimg.org/gallery/18zlnl5vm/39a082ad/

    So, codes of mpi_hello and mpi_comm places in my last message. mpi_hello works fine on BIGBROTHER, LITTLESISTER and both together (when I run on BIGBROTHER, and when I run on LITTLESISTER). mpi_comm works fine on BIGBROTHER, LITTLESISTER, but NOT both together (MPI_Recv error).

    In the pictures I see some strange thing about mpi_comm. Process 0 should send message to process 1. Process 1 should print, that it get message. So, Process 0 should start earlier, yes? But in printsreen I see, that earlier run process 1, next process 0. Very-very strange, I think.

    Do you understand, that I run 2 programs? mpi_hello and mpi_comm. Do mpi_comm works on your machines?

    2) About mpiexec.exe and msmpi.dll.

    C:\Program Files\Microsoft HPC Pack 2012\Bin\mpieec.exe

    C:\Windows\Sysrem32\msmpi.dll

    But why you ask me about that? If mpi_hello works fine.

    Yes, mpi_comm works fine on my machine. And I also tried setting up a different user and ran the commands similar to yours. If you login to both boxes as user cluster, does it still work?

    If not, can you try running this:

    runas /user:BIGBROTHER\cluster "cmd /K mpiexec -env MSMPI_DUMP_MODE 4 -hosts 2 BIGBROTHER 1 LITTLESISTER 1 \\BIGBROTHER\MPI\mpi_comm.exe"

    In the user profile directory (of the user cluster) you should see the dump files for each process. If you can upload the dump files we can help investigate this

    Thanks

    Anh

    Tuesday, December 17, 2013 9:53 PM
  • When I login as cluster user on both machines (last trying I login as administrator Yaroslaw on both machines) there is error, when I try start mpi_hello or mpi_comm:

    Aborting: Access denied by node 'LITTLESISTER'

    The smpd daemon is running with user credentials which are differenr from the user running the job.

    Maybe I should create "cluster" user with special options? Or create network user? I created cluster users like common user on both machines separate (just with same name and password). Is it correct? Maybe there are two different cluster users?

    Oh... Now, when I try to start my program with last scheme, I get the erorr "Aborting: Access denied..."

    Oh my god. What I do wrong?

    I power on 1 machine and 2 machine in Vmware. 1 machine have sharing folder, 2 machine see it. Setup firewall on both machine (mpi_hello.exe, smpd and mpiexec)

    I run smpd on both machine:

    runas /user:BIGBROTHER\cluster "C:\Program Files\Microsoft HPC Pack 2012\Bin\smpd.exe -d"

    on 1 machine

    runas /user:LITTLESISTER\cluster "C:\Program Files\Microsoft HPC Pack 2012\Bin\smpd.exe -d"

    on 2 machine

    On 1 machine I run:runas /user:BIGBROTHER\cluster "cmd /K mpiexec -hosts 2 BIGBROTHER 1 LITTLESISTER 1 \\BIGBROTHER\MPI\mpi_comm.exe"

    And it return access denied error.

    Wednesday, December 18, 2013 9:10 AM