none
No MPI - clusrun doesn't work RRS feed

  • Question

  • Hi,

    i am an absolute newbie in using the HPC and i hope anyone can help me: I have a small Cluster with 1 Headnode and 2 compute Nodes. If i run a job on one compute node only, it works fine. But if i start a job wich should run on booth compute nodes, the job runs only on the first node. Now i tried "clusrun smpd -status" and i got the following error:

    ****************
    Command has failed on node CLUSTERHEAD. Message:Task failed during execution wit
    h exit code 1. Please check task's output for error details.
    -------------------------- CLUSTERHEAD returns 1 --------------------------
    'smdp' is not recognized as an internal or external command,
    operable program or batch file.
    Command has failed on node CLUSTERPC02. Message:Error from node: CLUSTERPC02:Log
    on failure: unknown user name or bad passwordException of type 'Microsoft.Hpc.Ac
    tivation.NodeManagerException' was thrown.
    -------------------------- CLUSTERPC02 returns 1 --------------------------
    Command has failed on node CLUSTERPC01. Message:Error from node: CLUSTERPC01:Log
    on failure: unknown user name or bad passwordException of type 'Microsoft.Hpc.Ac
    tivation.NodeManagerException' was thrown.
    -------------------------- CLUSTERPC01 returns 1 --------------------------

    -------------------------- Summary --------------------------
    0 Nodes succeeded
    3 Nodes failed:CLUSTERHEAD,CLUSTERPC01,CLUSTERPC02
    ***************

    By using the command: clusrun /all hostname.exe i get this following error-message:

    ***************
    Command proxy has failed on node CLUSTERPC01. Message:Error from node: CLUSTERPC
    01:Logon failure: unknown user name or bad passwordException of type 'Microsoft.
    Hpc.Activation.NodeManagerException' was thrown.
    Command has been canceled on node CLUSTERHEAD. Message:Command output proxy has
    failed.
    Command has been canceled on node CLUSTERPC01. Message:Command output proxy has
    failed.
    Command has been canceled on node CLUSTERPC02. Message:Command output proxy has
    failed.

    -------------------------- Summary --------------------------
    0 Nodes succeeded
    3 Nodes failed:CLUSTERHEAD,CLUSTERPC01,CLUSTERPC02
    ****************

    Sorry, but i dont know what it means. Can anyone help me???

    Thank you, Albert
    • Moved by parmita mehtaModerator Wednesday, August 19, 2009 2:53 AM (From:Windows HPC Server Deployment, Management, and Administration)
    Monday, August 3, 2009 2:30 PM

Answers

  • Albert,

    It seems there is a logon issue with the error display in the output. "unknown user name or bad password" Are you running these commands on CLUSTERPC01 with a local account?

    Thanks,
    Ben
    Thursday, August 6, 2009 5:17 PM