none
Failed to run an MPI job on Windows HPC 2012 R RRS feed

  • Question

  • I can run the program on two computers not part of our HPC. I start smpd or msmpilauchsvc on both computers and then run the following command on either computer. Everything works well.

    mpiexec -hosts 2 MachineA 1 MachineB 1 MPIApp.exe

     

    However, I couldn’t run the MPI program on our HPC. I got the following error message.

    ERROR: Failed RpcCliStartMgr error -2147024809

    Aborting: mpiexec on MachineA is unable to connect to the smpd service on MachineB:8677

    Other MPI error, error stack:

    connect failed - The parameter is incorrect.  (errno -2147024809)

     

    I then tried to manually start smpd (running smpd -d). However, I got an access denied error. Do we really need to manually start smpd on each compute node of a cluster?

    Any suggestions to fix the issue? Thank you very much!
    Friday, February 26, 2016 7:02 PM

All replies

  • Hi Yefeng,

    You do need to have either smpd daemon (smpd -d) or msmpilaunchsvc running on each of the compute node. The msmpilaunchsvc can be configured to start automatically when the machine boots up so that you won't have to do that manually on each machine.

    Tuesday, March 1, 2016 4:37 PM
  • Hi Yefeng,

    I might have misunderstood your original question. If you are running under HPC Pack, you will need to go through the job submission system to run MPI jobs. HPC Pack comes with their own MSMPI service running on each node and the service handles authentication from job submissions. Running mpiexec manually will not work when the nodes are managed by HPC pack. If you want to run mpiexec manually on those nodes, you can try stopping the HPC Pack's msmpi service by calling "net stop msmpi" and start smpd / msmpi launch service on those nodes. Is there any reason you wouldn't want to submit the jobs using HPC Pack's job submission mechanism?

    Tuesday, March 1, 2016 10:05 PM