Asked by:
SMPD not found!

Question
-
I want to assign pi calculation from head to WIN-MGPROO1QCSP:8677.
But HPC show this message to me:
Aborting: smpd on WIN-09EFU0M66NK is unable to connect to the smpd service on WIN-MGPROO1QCSP:8677
Other MPI error, error stack:
connect failed - The parameter is incorrect. (errno -2147024809)Thursday, April 7, 2016 1:14 AM
All replies
-
Update: I had identify all of the setup was installed by same version HPC pack, msmpi, msmpi debugger.
ps. I want to run pi.exe over two nodes through Job manager.
The first host is Head node, second is compute node.
This is my command:
mpi /hosts 2 WIN-09EFUOM66NK 1 WIN-MGPROO1QCSP 1 pi.exe
- Proposed as answer by qiufang shiMicrosoft employee Thursday, April 7, 2016 2:40 AM
- Unproposed as answer by Yu Jie Ang Thursday, April 7, 2016 7:57 AM
Thursday, April 7, 2016 1:43 AM -
First, have you enabled PI.exe in the firewall allowed list? if not, try run below on all nodes (From the admin console, select all the nodes, run a command):
HpcFwUtil register myPiApp <path>\pi.exe
Then if you launch the MPI job from job manager, try this command
mpiexec <path>\pi.exe
while set the job Resource type to be Node, and your task resource 2.
Qiufang Shi
Thursday, April 7, 2016 2:39 AM -
I had closed all firewall setting.
I had tried only pi.exe on my compute node, but it show this error:
ERROR: Failed RpcCliStartMgr error -2147024809
Aborting: mpiexec on WIN-09EFU0M66NK is unable to connect to the smpd service on WIN-MGPROO1QCSP:8677
Other MPI error, error stack:
connect failed - The parameter is incorrect. (errno -2147024809)Then, I go my compute node check smpd -d. I realize smpd unable to debug.
I'm sure all msmpi related file installed properly. I don't know why...
Thursday, April 7, 2016 4:06 AM -
Can you try to run the MPI ping-pong diagnostics test through the admin console to check whether your system is well configured?
Qiufang Shi
Thursday, April 7, 2016 9:07 AM -
I got this result in PingPong Latency test.
- This node did not return diagnostics results. Possible reasons are: there was a network issue, or the files that are required for running the diagnostic test are not available on the node. If this is a custom diagnostic test that you added to the cluster, you need to verify that you have copied to all the nodes the files that are required for running the diagnostic test, especially to any new nodes that have joined the cluster.
- Edited by Yu Jie Ang Thursday, April 7, 2016 10:48 AM
Thursday, April 7, 2016 10:45 AM -
It looks like a network configuration issue. I suppose you have more than one network available. Thus please provide me more information about:
1. What your network topology looks like for all the machines. And what topology you configured in the To-do list?
2. What's you MPI network mask have you configured
3. Please also run the DNS test and Ping Test
And share the results with us.
Qiufang Shi
Friday, April 8, 2016 1:27 AM