none
Failing to add a preconfigured compute node to a cluster

    Question

  • Hi,

    I'm trying to add a preconfigured compute node to a cluster. Each box in the cluster have a single network card with a single IP so all nodes are in the 'enterprise network'. After I complete 'Add Node Wizard' I see that node is added with status Offline and with the following errors (the same errors I see when I try to apply a template to that newly created node):

    10/26/2011 8:22:52 PM    Failed to execute the change on the target node
    10/26/2011 8:22:52 PM    Could not contact node 'WIN-HPC2' to perform change. The management service was unable to connect to the node using any of the ip addresses resolved for the node.
    System.Security.Principal.IdentityNotMappedException: Some or all identity references could not be translated.
       at System.Security.Principal.SecurityIdentifier.Translate(IdentityReferenceCollection sourceSids, Type targetType, Boolean forceSuccess)
       at System.Security.Principal.SecurityIdentifier.Translate(Type targetType)
       at System.Security.Principal.WindowsIdentity.GetName()
       at System.Security.Principal.WindowsIdentity.get_Name()
       at Microsoft.SystemDefinitionModel.SdmServerChannel.ChannelAuth.IsConnectingIdentityAuthorized(IIdentity identity)
       at System.Runtime.Remoting.Channels.Tcp.TcpServerChannel.AcceptSocketCallback(IAsyncResult ar)
    10/26/2011 8:22:51 PM    Checking the configuration of compute node HPC\WIN-HPC2.
    10/26/2011 8:22:51 PM    Associating template Default ComputeNode Template with compute node HPC\WIN-HPC2
    10/26/2011 8:22:51 PM    Moving node HPC\WIN-HPC2 from state Unknown to state Provisioning.
    10/26/2011 8:22:51 PM    Assigning template Default ComputeNode Template to node WIN-HPC2.

    How is it to troubleshoot that: logs, etc.? what is exactly going wrong? All diagnostics tests went ok. Event logs says nothing. I can ping the host, etc.

    Any ideas?

    thanks.


    Andrei.

    • Edited by zandr Wednesday, October 26, 2011 6:17 PM
    Wednesday, October 26, 2011 4:37 PM

Answers

  • I understood, the problem appears becouse a system is cloned. And as result, two or more systems have the same GUID.

    You need to do sysprep with Generalize selected on every cloned system. After that cloned system will restart and you will need to join it into domain again (Dont forget to delete its computer account on AD previously). When you will add it on cluster a problem will disappear.


    Hope this will help you


    Please, don't forget to vote as helpful and mark as answered if the answer helped to solve your problem
    • Proposed as answer by SergiiKorin Thursday, October 27, 2011 2:42 PM
    • Marked as answer by zandr Friday, October 28, 2011 8:47 AM
    Thursday, October 27, 2011 2:42 PM
  • The same situation and the same workaround: http://social.microsoft.com/Forums/en-US/windowshpcitpros/thread/c8d5678e-056b-4ddb-aff6-fa863f9ebc84/

    Boxes were actually a VMWare cloned machines. When I created computed node from the ground up, not by wmware clone, then all went ok.

    The only question is about troubleshooting mechanism: is there any detailed log information, not just the stuff I provided in the initial post, which is quite non-informative?


    Andrei.



    • Marked as answer by zandr Thursday, October 27, 2011 11:46 AM
    • Edited by zandr Thursday, October 27, 2011 11:50 AM
    Thursday, October 27, 2011 11:45 AM

All replies

  • Hi,

    Try to turn off firewall and check that all hpc services is started on 'WIN-HPC2'. Also you can check that head node resolves win-hpc2 ip correctly.

     


    Please, don't forget to vote as helpful and mark as answered if the answer helped to solve your problem
    • Edited by SergiiKorin Thursday, October 27, 2011 11:07 AM
    Thursday, October 27, 2011 11:04 AM
  • The same situation and the same workaround: http://social.microsoft.com/Forums/en-US/windowshpcitpros/thread/c8d5678e-056b-4ddb-aff6-fa863f9ebc84/

    Boxes were actually a VMWare cloned machines. When I created computed node from the ground up, not by wmware clone, then all went ok.

    The only question is about troubleshooting mechanism: is there any detailed log information, not just the stuff I provided in the initial post, which is quite non-informative?


    Andrei.



    • Marked as answer by zandr Thursday, October 27, 2011 11:46 AM
    • Edited by zandr Thursday, October 27, 2011 11:50 AM
    Thursday, October 27, 2011 11:45 AM
  • I understood, the problem appears becouse a system is cloned. And as result, two or more systems have the same GUID.

    You need to do sysprep with Generalize selected on every cloned system. After that cloned system will restart and you will need to join it into domain again (Dont forget to delete its computer account on AD previously). When you will add it on cluster a problem will disappear.


    Hope this will help you


    Please, don't forget to vote as helpful and mark as answered if the answer helped to solve your problem
    • Proposed as answer by SergiiKorin Thursday, October 27, 2011 2:42 PM
    • Marked as answer by zandr Friday, October 28, 2011 8:47 AM
    Thursday, October 27, 2011 2:42 PM
  • Thanks a lot. I just cloned a new vm and after appliance of sysprep with Generalize feature turned on, everything went perfectly.

    Regards,


    Andrei.
    Friday, October 28, 2011 11:59 AM