Answered by:
attempting to send a message to the local process without a prior matching receive

Question
-
I use mpich2 combined with visual c++ 6.0 to reslove one equations by cholesky method. it give me two error as follws:
Fatal error in MPI_Send: Other MPI error, error stack:
MPI_Send(174): MPI_Send(buf=0012BF80, count=4, MPI_DOUBLE, dest=0, tag=0, MPI_CO
MM_WORLD) failed
MPID_Send(53): DEADLOCK: attempting to send a message to the local process witho
ut a prior matching receivejob aborted:
process: node: exit code: error message:
0: localhost: 1: Fatal error in MPI_Send: Other MPI error, error stack:
MPI_Send(174): MPI_Send(buf=0012BF80, count=4, MPI_DOUBLE, dest=0, tag=0, MPI_CO
MM_WORLD) failed
MPID_Send(53): DEADLOCK: attempting to send a message to the local process witho
ut a prior matching receive
Press any key to continuemy code is as follows :
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
//#define MPICH_SKIP_MPICXX //这句一定要有,否则报错
#include "mpi.h"#define MAX_PROCESSOR_NUM 12 /*set the max number of the Processor*/
#define MAX_ARRAY_SIZE 32 /*set the max size of the array*/int main(int argc, char *argv[])
{
double a[MAX_ARRAY_SIZE][MAX_ARRAY_SIZE], g[MAX_ARRAY_SIZE][MAX_ARRAY_SIZE];
int n;
double transTime = 0,tempCurrentTime, beginTime;MPI_Status status;
int rank, size = 2;
FILE *fin;
int i, j, k;int membershipKey;
membershipKey=rank%3;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
// MPI_Comm_split(MPI_COMM_WORLD,membershipKey,rank,&myComm);MPI_Comm_size(MPI_COMM_WORLD,&size);
// MPI_Group_excl(MPI_GROUP_WORLD,1,ranks,&grprem);/*创建一个不包括进程0的新的进程组*/
if(rank == 0)
{
fin = fopen("dataIn.txt","r");
if (fin == NULL) /* have no input source file */
{
puts("Not find input data file");
puts("Please create a file \"dataIn.txt\"");
puts("<example for dataIn.txt> ");
puts("4");
puts("1 3 4 5");
puts("3 5 6 1");
puts("4 6 2 7");
puts("5 1 7 8");
puts("\nArter\'s default data are running for you\n");
}
else
{
fscanf(fin, "%d", &n);
if ((n < 1)||(n > MAX_ARRAY_SIZE)) /* the matrix input error */
{
puts("Input the Matrix\'s size is out of range!");
exit(-1);
}
for(i = 0; i < n; i ++) /* get the data of the matric in */
{
for(j = 0; j < n; j ++) fscanf(fin,"%lf", &a[i][j]);
}
}/* put out the matrix */
puts("Cholersky Decomposion");
puts("Input Matrix A from dataIn.txt");
for(i = 0; i < n; i ++)
{
for(j = 0; j < n; j ++) printf("%9.5f ", a[i][j]);
printf("\n");
}
printf("\n");
}MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
for(k = 0; k < n; k ++)
{
/* gathering the result ,and then broacasting to each processor */
MPI_Bcast(a[k], (n-k)*MAX_ARRAY_SIZE, MPI_DOUBLE, 0, MPI_COMM_WORLD);size = 2;
for(i = k+rank; i < n; i += size)//开始并行了吗?
{
for(j = 0; j < k; j ++)
{
g[i][j] = a[i][j];
}
if (i == k)//rank = 0
{
for(j = k; j < n; j ++) g[i][j] = a[i][j]/sqrt(a[k][k]);//包括了akk
}
else
{
g[i][k] = a[i][k]/sqrt(a[k][k]);
for(j = k+1; j < n; j ++) g[i][j] = a[i][j] - a[i][k]*a[k][j]/a[k][k];
}
}/* use the Cholersky Algorithm */
for(i = k +rank; i < n; i ++)
{
MPI_Send(g[i], n, MPI_DOUBLE, 0, k*1000+i, MPI_COMM_WORLD);
}if(rank == 0)
{
for(j = 0; j < size; j ++)
{
for(i = k + j; i < n; i += size)
{
MPI_Recv(a[i], n, MPI_DOUBLE, j, k*1000+i, MPI_COMM_WORLD, &status);
}
}
}
}if (rank == 0)
{
puts("After Cholersky Discomposion");
puts("Output Matrix G");
for(i =0; i < n; i ++)
{
for(j = 0; j < i; j ++) printf(" ");
for(j = i; j < n; j ++) printf("%9.5f ", a[i][j]);
printf("\n");
} /* output the result */
}MPI_Finalize();/* end of the program */
}Tuesday, November 16, 2010 4:39 AM
Answers
-
Hello Changyun,
I did a quick check of your source code the part related with MPI_Send and MPI_Recv. It looks like they don't match.
To illuminate the problem, let's assume there are just two processors and the array size is just 4 (n=4 in your code).
For the first loop of k (say k = 0), for rank 0, you have 4 and for rank 1 you have 3 MPI_Send to rank 0:
for(i = k +rank; i < n; i ++)
{
MPI_Send(g[i], n, MPI_DOUBLE, 0, k*1000+i, MPI_COMM_WORLD);
}Next see what you have in MPI_Recv. Note that we take assumption size = 2 and k = 0 in the beginning. For rank 0, it will have 2 MPI_Recv from rank 0 and 2 from rank 1. Remember that you have 4 MPI_Send from rank 0 and 3 MPI_Send from rank 1. So there is mismatch of MPI_Send and MPI_Recv as the error message indicated.
if(rank == 0)
{
for(j = 0; j < size; j ++)
{
for(i = k + j; i < n; i += size)
{
MPI_Recv(a[i], n, MPI_DOUBLE, j, k*1000+i, MPI_COMM_WORLD, &status);
}
}
}Thanks,
James
- Marked as answer by Don Pattee Wednesday, January 12, 2011 3:00 AM
Tuesday, November 16, 2010 5:16 PM