none
Trouble Provisioning Nodes RRS feed

  • Question

  • using HPC Server 2008 (no SP1, won't install..) 

    Suddenly I cannot provision new nodes.  They install fine until they get to this point:

    Time    Message
    7/13/2010 7:10:44 PM    Failed to execute the change on the target node
    7/13/2010 7:10:44 PM    Could not contact node 'NODE-11' to perform change. The management service was unable to connect to the node using any of the ip addresses resolved for the node.
    7/13/2010 7:10:41 PM    Checking the configuration of compute node CLUSTER\NODE-11.
    7/13/2010 7:10:40 PM    Cleaning up HPC Pack Install Data

     

    and then they revert.  This is the same setup I've used successfully in the past.  I'm not sure where to look.  I've restarted AD / DNS / DHCP / HPC Management, etc before imaging, but the same deal.  Identical hardware to other nodes..  How do I fix this?

     

     

    Wednesday, July 14, 2010 4:05 AM

Answers

  • Some questions for you.

    can you tell me network topology  of your cluster? you say you are deplying from bare-metal,  do you have the DHCP server on the headnode? do you see an IP address for node-11?

    How many compute nodes are you trying to 'deploy'?  can you post the entire provisioning log, the snippet does not have enough context for debug. what do you see on the compute node screen?

    thanks


    pm
    Wednesday, July 14, 2010 11:00 PM
    Moderator

All replies

  • Couple of questions:

    1. Why can't you install SP1?   The only time I've seen this happening is when I installed HPC Pack 2008 on top of Windows Server 2008 R2 and tried to update to SP1.
    2. Run a "All Services Running" diagnostics on all nodes and see if you get an error back.  My guess is that the "Management Service" on the node you are trying to configure is not running or DNS is messed up.  Try running a "DNS" diagnostics as well.
    Thanks.
    Wednesday, July 14, 2010 6:40 PM
  • thanks for the reply.

     

    1. I don't recall the reason SP1 fails to install.  I've installed my headnode with a clean windows 2008 hpc install and then immediately installed sp1.  Also tried it after configuring a cluster.  I looked into the reason but couldn't find anything in the forums, support, etc that indicated why.  Maybe I'll try again starting a cluster on one of the nodes and see if I can discover the reason it doesn't install.

    2.  I am provisioning them from bare metal, so I don't think all services running diagnostics will help.  It doesn't seem to be able to contact the node.  I tried pinging them by host name and it seems that there is no dns entry for the newly provisioned nodes.  How do I go from here.  What DNS diagnostics can I run?  There's nothing suspicious in the logs.  Where do I look now?

     

    Wednesday, July 14, 2010 7:09 PM
  • Some questions for you.

    can you tell me network topology  of your cluster? you say you are deplying from bare-metal,  do you have the DHCP server on the headnode? do you see an IP address for node-11?

    How many compute nodes are you trying to 'deploy'?  can you post the entire provisioning log, the snippet does not have enough context for debug. what do you see on the compute node screen?

    thanks


    pm
    Wednesday, July 14, 2010 11:00 PM
    Moderator