locked
Question on gathering data from nodes RRS feed

  • Question

  • #include <stdio.h>
    #include <mpi.h>
    #include <unistd.h>
    #include <iostream>

    #define dataSizeTotal 10000

    using namespace std;

    float dataRoot[dataSizeTotal];

    int main(int argc, char *argv[]) {

       for(int i = 0; i < dataSizeTotal; i++)
       {
            dataRoot[i] = 2.0;
       }

       MPI_Init(&argc, &argv);
       int commSize, commRank, j;
       MPI_Comm_size(MPI_COMM_WORLD, &commSize);
       MPI_Comm_rank(MPI_COMM_WORLD, &commRank);

       //the array pointer in each node points to certain part of the big array
       float *sub_dataRoot = dataRoot + commRank * dataSizeTotal/commSize;

       //modify the big array through each node
       for (j = 0; j < dataSizeTotal / commSize; j++)
       sub_dataRoot[j] = sub_dataRoot[j] + (float)commRank + 1.0;

       MPI_Barrier(MPI_COMM_WORLD);

       //check results
       if ( commRank == 0) {

          printf( "Through the root node, the value of the 6000th item is:%f\n\n", dataRoot[6000] );
       }

       MPI_Barrier(MPI_COMM_WORLD);

       if ( commRank == 1 ) {

          printf( "Through the second node, the value of the 6000th item is:%f\n", dataRoot[6000] );
       }

       MPI_Finalize();
       return 0;
    }

    I used two nodes for this program by executing "mpiexec -np 2 ./mpitest". In this program, I generate an array "dataRoot" of 10,000 numbers, and each element of this array is initiated as 2.0. In each node, I name a pointer to point to certain part of "dataRoot". The pointer of the first node should point to the address of dataRoot[0], and that of the second node should point to the address of dataRoot[5000]. The program then modifies the original array in each node. The first node is supposed to change the first 5000 elements to 3.0, and the second node is supposed to change the next 5000 elements to 4.0. However, when I visit the element dataRoot[6000] through the root node and the second node inspectively, I got:

    Through the root node, the value of the 6000th item is:2.000000

    Through the second node, the value of the 6000th item is:4.000000

    It seems that each node keeps a copy of rootData[10000], even it's claimed before MPI_Init(), and each node can only access its own copy of rootData. Is it the case? Is it possible all the nodes work on the same copy of rootData so no data distribution and gathering are necessary? I understand that I can use MPI_Scatter and MPI_Gather for this purpose. But execution time becomes longer.

    Thanks for your advice.

    Monday, November 28, 2011 8:39 PM

Answers

  • #include <stdio.h>
    #include <mpi.h>
    #include <assert.h> 
    #define dataSizeTotal 10000
     
    int main(int argc, char *argv[]) {
        int i = 0;
        int j = 0;
        int commSize, commRank;
        float dataRoot[dataSizeTotal];
        float* sub_dataRoot;
        MPI_Win win;
        MPI_Init(&argc, &argv);
        
        MPI_Comm_size(MPI_COMM_WORLD, &commSize);
        MPI_Comm_rank(MPI_COMM_WORLD, &commRank);
        for(i = 0; i < dataSizeTotal; i++)
        {
             dataRoot[i] = 2.0;
        }
    
        // the array pointer in each node points to certain part of the big array
       sub_dataRoot = dataRoot + commRank * dataSizeTotal/commSize;
     
       //modify the big array through each node
       for (j = 0; j < dataSizeTotal / commSize; j++)
           sub_dataRoot[j] = sub_dataRoot[j] + (float)commRank + (float)1.0;
     
    MPI_Win_create(dataRoot,dataSizeTotal * sizeof(float), sizeof(float), MPI_INFO_NULL, MPI_COMM_WORLD, &win);
    if (commRank == 1) { MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, MPI_MODE_NOCHECK, win); MPI_Put(sub_dataRoot,dataSizeTotal/commSize, MPI_FLOAT, 0, dataSizeTotal/commSize, dataSizeTotal/commSize, MPI_FLOAT, win); MPI_Win_unlock(0, win); } MPI_Barrier(MPI_COMM_WORLD); //check results if ( commRank == 0) { for (j = 0; j < dataSizeTotal/commSize; j++) assert (dataRoot[j] == 3.0); for (j = dataSizeTotal/commSize; j < dataSizeTotal; j++) assert (dataRoot[j] == 4.0); printf( "Through the root node, the value of the 6000th item is:%f\n\n", dataRoot[6000] ); } MPI_Barrier(MPI_COMM_WORLD); if ( commRank == 1 ) { printf( "Through the second node, the value of the 6000th item is:%f\n", dataRoot[6000] ); } MPI_Finalize(); return 0; }

    Hi Edwin,
    The above code accomplishes what your code was trying to do. Notice the usage of MPI one-sided communication. Both processes specify a "window", which is a block of memory that other processes can read/write from. There are many different ways to do one-sided communication. In this code, I've chose the "passive target" approach, in which the target (rank 0) does not have to participate in the synchronization. You can read more about MPI one-sided communication from the MPI standard. Let me know if my explanation was not clear enough or if you have problem understanding the code.
    Friday, December 2, 2011 10:29 PM

All replies

  • Hi,

    MPI is distributed in nature and thus each process will not be able to natively access other processes' data. In this particular case, rank 0 and rank 0 each has its own copy of dataRoot. There are two ways both processes can cooperatively work on this array:

    1) Through the use of two-sided communication. Scatter/Gather is an example

    2) Through the use of one-sided communication. In this case both processes create a window that, for example, points to rank 0's copy of dataroot. Rank 1 can read/write to rank 0 using MPI one-sided communication (unlike the first paradigm, rank 0 in this case doesn't even have to participate in the MPI calls).

    Let me know if you want further clarification or explanation.

     

     

    Wednesday, November 30, 2011 7:02 PM
  • Hi Anh.Vo:

    Thank you for your reply. I tried "Gather" and it works. In my example since both rank 0 and rank 1 get a copy of "rootData", I don't even have to "Scatter" it. But I do need to "Gather" the data back from each rank and store it into "rootData" in rank 0 and then I could visit and read the modified results correctly (from rank 0). I am not clear about the low-level mechanism of MPI, so my question is, since I only declared and defined "rootData" once, after "MPI_Init", does the system automatically duplicate this array and assign it to each rank? I guess it's the case since MPI doesn't really use "shared memory" or it treats shared memory as distributed memory.

    I am interested in the second method you mentioned. Could you give more details? Would appreciate if you could give the code directly, if it wouldn't take too much of your time. Thanks a lot,

     


    Edwen
    Wednesday, November 30, 2011 7:45 PM
  • #include <stdio.h>
    #include <mpi.h>
    #include <assert.h> 
    #define dataSizeTotal 10000
     
    int main(int argc, char *argv[]) {
        int i = 0;
        int j = 0;
        int commSize, commRank;
        float dataRoot[dataSizeTotal];
        float* sub_dataRoot;
        MPI_Win win;
        MPI_Init(&argc, &argv);
        
        MPI_Comm_size(MPI_COMM_WORLD, &commSize);
        MPI_Comm_rank(MPI_COMM_WORLD, &commRank);
        for(i = 0; i < dataSizeTotal; i++)
        {
             dataRoot[i] = 2.0;
        }
    
        // the array pointer in each node points to certain part of the big array
       sub_dataRoot = dataRoot + commRank * dataSizeTotal/commSize;
     
       //modify the big array through each node
       for (j = 0; j < dataSizeTotal / commSize; j++)
           sub_dataRoot[j] = sub_dataRoot[j] + (float)commRank + (float)1.0;
     
    MPI_Win_create(dataRoot,dataSizeTotal * sizeof(float), sizeof(float), MPI_INFO_NULL, MPI_COMM_WORLD, &win);
    if (commRank == 1) { MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, MPI_MODE_NOCHECK, win); MPI_Put(sub_dataRoot,dataSizeTotal/commSize, MPI_FLOAT, 0, dataSizeTotal/commSize, dataSizeTotal/commSize, MPI_FLOAT, win); MPI_Win_unlock(0, win); } MPI_Barrier(MPI_COMM_WORLD); //check results if ( commRank == 0) { for (j = 0; j < dataSizeTotal/commSize; j++) assert (dataRoot[j] == 3.0); for (j = dataSizeTotal/commSize; j < dataSizeTotal; j++) assert (dataRoot[j] == 4.0); printf( "Through the root node, the value of the 6000th item is:%f\n\n", dataRoot[6000] ); } MPI_Barrier(MPI_COMM_WORLD); if ( commRank == 1 ) { printf( "Through the second node, the value of the 6000th item is:%f\n", dataRoot[6000] ); } MPI_Finalize(); return 0; }

    Hi Edwin,
    The above code accomplishes what your code was trying to do. Notice the usage of MPI one-sided communication. Both processes specify a "window", which is a block of memory that other processes can read/write from. There are many different ways to do one-sided communication. In this code, I've chose the "passive target" approach, in which the target (rank 0) does not have to participate in the synchronization. You can read more about MPI one-sided communication from the MPI standard. Let me know if my explanation was not clear enough or if you have problem understanding the code.
    Friday, December 2, 2011 10:29 PM
  • Thank you so much, Anh. I spent somethime to read about "one-side communication", too, and that helps to understand your code. Many many thanks!
    Edwen
    Saturday, December 3, 2011 1:43 PM