locked
Submit failed RRS feed

  • Question

  • My code download :
    http://www.xun6.com/file/7212ee611/SARMPI.cpp.html
    Error message :
    http://www.xun6.com/file/3741b4111/error.txt.html

    That error message :

    job aborted:
    [ranks] message

    [0] fatal error
    Fatal error in MPI_Send: Other MPI error, error stack:
    MPI_Send(176)...........: MPI_Send(buf=0x00000000002BFA24, count=1, MPI_INT, dest=1, tag=10, MPI_COMM_WORLD) failed
    MPIDI_CH3I_Progress(244): handle_sock_op failed
    ConnectFailed(1061).....: [ch3:sock] failed to connnect to remote process 1735B01A-0ACE-41e3-837C-BBFFEF33623D:1
    ConnectFailed(986)......: unable to connect to 192.168.1.2 on port 59023, exhausted all endpoints
    ConnectFailed(977)......: unable to connect to 192.168.1.2 on port 59023, A connection attempt failed because the connected party did not properly respond after a period of time,
                              or established connection failed because connected host has failed to respond.  (errno 10060)

    [1-3] terminated

    ---- error analysis -----

    [0] on SERVER1        (sometimes on SERVER2 or SERVER3  unable to connect to 192.168.1.3(4)  )
    mpi has detected a fatal error and aborted mpisar.exe

    ---- error analysis -----

    PS: 192.168.1.2 on server2

    On HPC Cluster Manager....
    When I submit a easy program (ex: helloMPI.exe  ) it is ok .
    http://img39.imageshack.us/img39/2372/75741408.jpg
    But I submit the SARMPI.exe program  is failed.

    The HPC Services is  ok ↓
    http://img39.imageshack.us/img39/6289/16866294.jpg

    My all PC can mutual ping 192.168.1.X

    So I think the error message mean SARMPI code part , but I don't know the problem part .
    Please help me , ths !!

    Thursday, March 4, 2010 6:27 AM

Answers

  • All the firewall should be turned off. Do you mean turn on or off? If you mean firewall turn off and still has the problem to run the mpi apps, please copy the command you used for job submission.

    Thanks,
    James


    I solve my problem that I change my code  :

    Before : SARMPI.exe  http://www.xun6.com/file/781f82222/SARMPI.cpp.html   state : Running
       After : MPISAR.exe http://www.xun6.com/file/66e677822/MPISAR.cpp.html   state : Finished

     Star    for(i = 0; i <= 360; i++ )  →   if (myid == 0)

    What is the axiom !?

    haha I already solved.

    In the SARMPI.exe code .....

    while ( i < 90)
    {
       i = 0 ;  ← unnecessary
    }
    • Marked as answer by YuJinSu Friday, March 5, 2010 7:12 AM
    Friday, March 5, 2010 7:06 AM

All replies

  • Hi Yujin,
    From the error message, it might be a firewall issue. If you are running your MPI apps on the enterprise-only network, you should allow the firewalls on all the nodes to open the ports for all MPI service and MPI app. Or simply, just turn off the firewall on each node. Please let me know whether this works for your case.

    Thanks,
    James
    Thursday, March 4, 2010 7:50 PM
  • Hi Yujin,
    From the error message, it might be a firewall issue. If you are running your MPI apps on the enterprise-only network, you should allow the firewalls on all the nodes to open the ports for all MPI service and MPI app. Or simply, just turn off the firewall on each node. Please let me know whether this works for your case.

    Thanks,
    James

    Hi James Ren

    I sure all PC firewall turn on, but don't solve the problen.
    Friday, March 5, 2010 12:59 AM
  • All the firewall should be turned off. Do you mean turn on or off? If you mean firewall turn off and still has the problem to run the mpi apps, please copy the command you used for job submission.

    Thanks,
    James

    Friday, March 5, 2010 4:04 AM
  • All the firewall should be turned off. Do you mean turn on or off? If you mean firewall turn off and still has the problem to run the mpi apps, please copy the command you used for job submission.

    Thanks,
    James


    Sorry ! I mean my all pc firewall should be turned off.

    On HPC Cluster Manager....
    command line  :  mpiexec sarmpi.exe
    Friday, March 5, 2010 6:01 AM
  • All the firewall should be turned off. Do you mean turn on or off? If you mean firewall turn off and still has the problem to run the mpi apps, please copy the command you used for job submission.

    Thanks,
    James


    I solve my problem that I change my code  :

    Before : SARMPI.exe  http://www.xun6.com/file/781f82222/SARMPI.cpp.html   state : Running
       After : MPISAR.exe http://www.xun6.com/file/66e677822/MPISAR.cpp.html   state : Finished

     Star    for(i = 0; i <= 360; i++ )  →   if (myid == 0)

    What is the axiom !?
    Friday, March 5, 2010 6:29 AM
  • All the firewall should be turned off. Do you mean turn on or off? If you mean firewall turn off and still has the problem to run the mpi apps, please copy the command you used for job submission.

    Thanks,
    James


    I solve my problem that I change my code  :

    Before : SARMPI.exe  http://www.xun6.com/file/781f82222/SARMPI.cpp.html   state : Running
       After : MPISAR.exe http://www.xun6.com/file/66e677822/MPISAR.cpp.html   state : Finished

     Star    for(i = 0; i <= 360; i++ )  →   if (myid == 0)

    What is the axiom !?

    haha I already solved.

    In the SARMPI.exe code .....

    while ( i < 90)
    {
       i = 0 ;  ← unnecessary
    }
    • Marked as answer by YuJinSu Friday, March 5, 2010 7:12 AM
    Friday, March 5, 2010 7:06 AM