locked
SMPD not found! RRS feed

  • Question

  • I want to assign pi calculation from head to WIN-MGPROO1QCSP:8677.

    But HPC show this message to me:

    Aborting: smpd on WIN-09EFU0M66NK is unable to connect to the smpd service on WIN-MGPROO1QCSP:8677
    Other MPI error, error stack:
    connect failed - The parameter is incorrect.  (errno -2147024809)

    Thursday, April 7, 2016 1:14 AM

All replies

  • Update: I had identify all of the setup was installed by same version HPC pack, msmpi, msmpi debugger.

    ps. I want to run pi.exe over two nodes through Job manager.

    The first host is Head node, second is compute node.

    This is my command:

    mpi /hosts 2 WIN-09EFUOM66NK 1 WIN-MGPROO1QCSP 1 pi.exe

    Thursday, April 7, 2016 1:43 AM
  • First, have you enabled PI.exe in the firewall allowed list? if not, try run below on all nodes (From the admin console, select all the nodes, run a command):

               HpcFwUtil register myPiApp <path>\pi.exe

    Then if you launch the MPI job from job manager, try this command

              mpiexec <path>\pi.exe

    while set the job Resource type to be Node, and your task resource 2.


    Qiufang Shi

    Thursday, April 7, 2016 2:39 AM
  • I had closed all firewall setting.

    I had tried only pi.exe on my compute node, but it show this error:

    ERROR: Failed RpcCliStartMgr error -2147024809

    Aborting: mpiexec on WIN-09EFU0M66NK is unable to connect to the smpd service on WIN-MGPROO1QCSP:8677
    Other MPI error, error stack:
    connect failed - The parameter is incorrect.  (errno -2147024809)

    Then, I go my compute node check smpd -d. I realize smpd unable to debug.

    I'm sure all msmpi related file installed properly. I don't know why...

    Thursday, April 7, 2016 4:06 AM
  • Can you try to run the MPI ping-pong diagnostics test through the admin console to check whether your system is well configured? 

    Qiufang Shi

    Thursday, April 7, 2016 9:07 AM
  • I got this result in PingPong Latency test.

    • This node did not return diagnostics results. Possible reasons are: there was a network issue, or the files that are required for running the diagnostic test are not available on the node. If this is a custom diagnostic test that you added to the cluster, you need to verify that you have copied to all the nodes the files that are required for running the diagnostic test, especially to any new nodes that have joined the cluster. 

    • Edited by Yu Jie Ang Thursday, April 7, 2016 10:48 AM
    Thursday, April 7, 2016 10:45 AM
  • It looks like a network configuration issue. I suppose you have more than one network available. Thus please provide me more information about:

    1. What your network topology looks like for all the machines. And what topology you configured in the To-do list?

    2. What's you MPI network mask have you configured

    3. Please also run the DNS test and Ping Test

    And share the results with us.


    Qiufang Shi

    Friday, April 8, 2016 1:27 AM