Answered by:
Submit failed

Question
-
My code download :
http://www.xun6.com/file/7212ee611/SARMPI.cpp.html
Error message :
http://www.xun6.com/file/3741b4111/error.txt.html
That error message :
job aborted:
[ranks] message[0] fatal error
Fatal error in MPI_Send: Other MPI error, error stack:
MPI_Send(176)...........: MPI_Send(buf=0x00000000002BFA24, count=1, MPI_INT, dest=1, tag=10, MPI_COMM_WORLD) failed
MPIDI_CH3I_Progress(244): handle_sock_op failed
ConnectFailed(1061).....: [ch3:sock] failed to connnect to remote process 1735B01A-0ACE-41e3-837C-BBFFEF33623D:1
ConnectFailed(986)......: unable to connect to 192.168.1.2 on port 59023, exhausted all endpoints
ConnectFailed(977)......: unable to connect to 192.168.1.2 on port 59023, A connection attempt failed because the connected party did not properly respond after a period of time,
or established connection failed because connected host has failed to respond. (errno 10060)[1-3] terminated
---- error analysis -----
[0] on SERVER1 (sometimes on SERVER2 or SERVER3 unable to connect to 192.168.1.3(4) )
mpi has detected a fatal error and aborted mpisar.exe---- error analysis -----
PS: 192.168.1.2 on server2
On HPC Cluster Manager....
When I submit a easy program (ex: helloMPI.exe ) it is ok .
http://img39.imageshack.us/img39/2372/75741408.jpg
But I submit the SARMPI.exe program is failed.
The HPC Services is ok ↓
http://img39.imageshack.us/img39/6289/16866294.jpg
My all PC can mutual ping 192.168.1.X
So I think the error message mean SARMPI code part , but I don't know the problem part .
Please help me , ths !!Thursday, March 4, 2010 6:27 AM
Answers
-
All the firewall should be turned off. Do you mean turn on or off? If you mean firewall turn off and still has the problem to run the mpi apps, please copy the command you used for job submission.
Thanks,
James
I solve my problem that I change my code :
Before : SARMPI.exe http://www.xun6.com/file/781f82222/SARMPI.cpp.html state : Running
After : MPISAR.exe http://www.xun6.com/file/66e677822/MPISAR.cpp.html state : Finished
Star for(i = 0; i <= 360; i++ ) → if (myid == 0)
What is the axiom !?
haha I already solved.
In the SARMPI.exe code .....
while ( i < 90)
{
i = 0 ; ← unnecessary
}- Marked as answer by YuJinSu Friday, March 5, 2010 7:12 AM
Friday, March 5, 2010 7:06 AM
All replies
-
Hi Yujin,
From the error message, it might be a firewall issue. If you are running your MPI apps on the enterprise-only network, you should allow the firewalls on all the nodes to open the ports for all MPI service and MPI app. Or simply, just turn off the firewall on each node. Please let me know whether this works for your case.
Thanks,
JamesThursday, March 4, 2010 7:50 PM -
Hi Yujin,
From the error message, it might be a firewall issue. If you are running your MPI apps on the enterprise-only network, you should allow the firewalls on all the nodes to open the ports for all MPI service and MPI app. Or simply, just turn off the firewall on each node. Please let me know whether this works for your case.
Thanks,
James
Hi James Ren
I sure all PC firewall turn on, but don't solve the problen.Friday, March 5, 2010 12:59 AM -
All the firewall should be turned off. Do you mean turn on or off? If you mean firewall turn off and still has the problem to run the mpi apps, please copy the command you used for job submission.
Thanks,
JamesFriday, March 5, 2010 4:04 AM -
All the firewall should be turned off. Do you mean turn on or off? If you mean firewall turn off and still has the problem to run the mpi apps, please copy the command you used for job submission.
Thanks,
James
Sorry ! I mean my all pc firewall should be turned off.
On HPC Cluster Manager....
command line : mpiexec sarmpi.exeFriday, March 5, 2010 6:01 AM -
All the firewall should be turned off. Do you mean turn on or off? If you mean firewall turn off and still has the problem to run the mpi apps, please copy the command you used for job submission.
Thanks,
James
I solve my problem that I change my code :
Before : SARMPI.exe http://www.xun6.com/file/781f82222/SARMPI.cpp.html state : Running
After : MPISAR.exe http://www.xun6.com/file/66e677822/MPISAR.cpp.html state : Finished
Star for(i = 0; i <= 360; i++ ) → if (myid == 0)
What is the axiom !?Friday, March 5, 2010 6:29 AM -
All the firewall should be turned off. Do you mean turn on or off? If you mean firewall turn off and still has the problem to run the mpi apps, please copy the command you used for job submission.
Thanks,
James
I solve my problem that I change my code :
Before : SARMPI.exe http://www.xun6.com/file/781f82222/SARMPI.cpp.html state : Running
After : MPISAR.exe http://www.xun6.com/file/66e677822/MPISAR.cpp.html state : Finished
Star for(i = 0; i <= 360; i++ ) → if (myid == 0)
What is the axiom !?
haha I already solved.
In the SARMPI.exe code .....
while ( i < 90)
{
i = 0 ; ← unnecessary
}- Marked as answer by YuJinSu Friday, March 5, 2010 7:12 AM
Friday, March 5, 2010 7:06 AM