none
HPC Cluster mpiexec error RRS feed

  • Question

  • Hi,

    I'm running a cluster with all nodes on HPC Pack 2016 version 5.2.6291.0 and am having an issue.

    Running a job such as: "job submit /numnodes:2 mpiexec -n 2 mpipingpong \\CFD01\Temp\results.xml" results in the following error message:

    ERROR: Failed RpcCliStartMgr error -2147024809

    Aborting: mpiexec on CFD01 is unable to connect to the smpd service on CFD01:8677
    Other MPI error, error stack:
    connect failed - The parameter is incorrect.  (errno -2147024809)

    When running the MPI Ping-Pong Diagnostics in the Cluster Manager, they all succeed so I'm at a loss on why this is an issue. Running jobs such as from the software Fluent also gives this error.

    Edit: Trying to start MsmpiLaunchSvc manually results in the error: Failed to start listening to PMI clients. Error=0x800704d9

    Any ideas? Any help would be highly appreciated.

    Davin


    • Edited by Davy10 Friday, February 8, 2019 9:17 AM
    Friday, February 8, 2019 9:00 AM

Answers

  • Solved this by setting the head node as the active directory domain, and domain joining the compute nodes. Before the compute nodes were nondomain joined, but I guess that doesn't work.

    • Marked as answer by Davy10 Sunday, February 10, 2019 8:53 AM
    Sunday, February 10, 2019 8:53 AM