Hello, Everybody, I built a cluster with WCCS2003 , on which I am doing Linpack test. But there is a problem haunted me for a long time. My cluster has 3 nodes, each node owns 4 Intel Xeon Dual Core processors and 16G RAM. Using the HPL.DAT below, I did Linpack tests successfully on every single node. However, after I only modified the Ps*Qs (e.g. 1*24 or 4*6) and assigned this task to 3 nodes, the computing time was much longer than that on a single node.
What's more, when I executed "mpipingpong.exe" on 3 nodes, I found that the cpu usage of all nodes kept 100% for several hours but I still did not get the result of pingpong test. I do not know to solve this problem and wish your help.
Does anybody meet the problem as mine ?
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
1 # of problems sizes (N)
10000 Ns
1 # of NBs
200 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
1 Ps
4 Qs
16.0 threshold
1 # of panel fact
1 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
4 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
1 # of recursive panel fact.
2 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
0 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
80 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)