none
Add preconfigured node failed during Provisioning

    Question

  • Hi!

    I have a cluster with 8 compute nodes with Domain (developer.com ) and the head node's name is developer-wccs.developer.com .
    In order to backup the domain and DNS server ,I have deployed another windows HPC Server 2008 only with second domain contorller and backup dns server,
    which named developer-wccs2.developer.com

    Some days before,my origin head node crashed because of disk broken, so I have to recover the cluster. Since developer-wccs2 has the domain and dns , so I depolied the HPC Pack on it and made it as a new head node.

    The new head node works ok, and I chosen the topology#1 as the cluster network as same as before.
    I created a node template without operation system image and used the node xml to add the existed compute nodes to the new head node, since all the compute nodes need not be deployed again.

    When I add a compute node , I got an error when provisioning, and the log is as following:

    Time    Message
    2009/8/25 2:50:23    Reverted
    2009/8/25 2:50:23    Dissasociating template from compute node DEVELOPER\COMPUTE-3-1
    2009/8/25 2:50:23    The parent operation is being rolled back
    2009/8/25 2:50:23    The parent operation is being rolled back
    2009/8/25 2:50:23    The operation failed due to errors during execution.
    2009/8/25 2:50:23    The operation failed and will not be retried.
    2009/8/25 2:50:23    The compute node failed to execute the operation.
    2009/8/25 2:50:23    The Management service encountered an error while performing a change on this node. Access is denied to user 'DEVELOPER\DEVELOPER-WCCS2$'. Check the operation log in the Administration Console for more information.
    2009/8/25 2:50:17    Checking the configuration of compute node DEVELOPER\COMPUTE-3-1.
    2009/8/25 2:50:17    The compute node failed to execute the operation.
    2009/8/25 2:50:17    The Management service encountered an error while performing a change on this node. Access is denied to user 'DEVELOPER\DEVELOPER-WCCS2$'. Check the operation log in the Administration Console for more information.
    2009/8/25 2:50:11    Checking the configuration of compute node DEVELOPER\COMPUTE-3-1.
    2009/8/25 2:50:11    The compute node failed to execute the operation.
    2009/8/25 2:50:11    The Management service encountered an error while performing a change on this node. Access is denied to user 'DEVELOPER\DEVELOPER-WCCS2$'. Check the operation log in the Administration Console for more information.
    2009/8/25 2:50:05    Checking the configuration of compute node DEVELOPER\COMPUTE-3-1.
    2009/8/25 2:50:05    The compute node failed to execute the operation.
    2009/8/25 2:50:05    The Management service encountered an error while performing a change on this node. Access is denied to user 'DEVELOPER\DEVELOPER-WCCS2$'. Check the operation log in the Administration Console for more information.
    2009/8/25 2:50:01    Checking the configuration of compute node DEVELOPER\COMPUTE-3-1.
    2009/8/25 2:50:01    The compute node failed to execute the operation.
    2009/8/25 2:50:01    The Management service encountered an error while performing a change on this node. Access is denied to user 'DEVELOPER\DEVELOPER-WCCS2$'. Check the operation log in the Administration Console for more information.
    2009/8/25 2:49:55    Checking the configuration of compute node DEVELOPER\COMPUTE-3-1.
    2009/8/25 2:49:55    Associating template Dualboot ComputeNode Template with compute node DEVELOPER\COMPUTE-3-1
    2009/8/25 2:49:55    Moving node DEVELOPER\COMPUTE-3-1 from state Unknown to state Provisioning.
    2009/8/25 2:49:55    Assigning template Dualboot ComputeNode Template to node COMPUTE-3-1.

    I found the core error is "The Management service encountered an error while performing a change on this node. Access is denied to user 'DEVELOPER\DEVELOPER-WCCS2$'. Check the operation log in the Administration Console for more information."

    I have tried to turn off the firewall of compute node COMPUTE-3-1,  and the same error occupied.


    Has anyone seen this before? Any comments or suggestions would be appreciated!

    Thank you very much!



    Chu Qiu
    Monday, August 24, 2009 1:58 PM

Answers

  • I have reslove the problem, the reason is:
    in the node xml, I have set two MacAddress , and one of them has no cable, and the Cluster Manager used this port to connect the compute node.
    so error occupied.

    thank you
    chu
    Chu Qiu
    Tuesday, August 25, 2009 1:48 PM

All replies

  • Chu,

    Are these compute nodes originally apart of ' developer-wccs.developer.com'? If so, you will need to move the compute node to new headnode by editting the following key:

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\HPC
    ClusterName
    REG_SZ

    If the Add-Node Wizard still fails, check the Administrators group on the compute node to ensure that the new headnode machine account was properly added.

    Thanks,
    Ben
    Monday, August 24, 2009 6:55 PM
  • Chu,

    Are these compute nodes originally apart of ' developer-wccs.developer.com '? If so, you will need to move the compute node to new headnode by editting the following key:

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\HPC
    ClusterName
    REG_SZ

    If the Add-Node Wizard still fails, check the Administrators group on the compute node to ensure that the new headnode machine account was properly added.

    Thanks,
    Ben

    Ben,
    Thank you for your reply!

    All the compute nodes and the two head nodes is in the same domain "developer.com", and I use the domain administrator to perform the deployment process.

    Chu

    Chu Qiu
    Monday, August 24, 2009 11:21 PM
  • I have reslove the problem, the reason is:
    in the node xml, I have set two MacAddress , and one of them has no cable, and the Cluster Manager used this port to connect the compute node.
    so error occupied.

    thank you
    chu
    Chu Qiu
    Tuesday, August 25, 2009 1:48 PM