none
Head node can't detect compute nodes

    Question

  • I have windows 2008 R2 and HPC Pack 2008 R2-RC1 installed on all nodes.

    When i tried to add compute  nodes in HPC Cluster manager, after selecting node template, no nodes showed up.

    During compute node HPC Pack installation, i have selected head node name (although it failed at HPC Pack server components installation, i thought it is compute node, so that error can be ignored).

     

    Head node can see all other nodes in windows network interface though.

    Firewall is off for all interface (home, public, and domain).

     

    What else should i look?

    Monday, September 20, 2010 1:27 AM

Answers

  • the headnode is setup by default to not respond to pXE requests from unknown nodes.

    So, if you have not added the compute nodes to the head node- via node xml( where each node is identified by the its MAC address), then what you are seeing is the default behavior.

    there are two things you could do here

    a) create and import a nodexml for the compute node -- and then powerthem on for a bare-metal deployment

    b) if you are sure that the network over which the deployment is to occur, is private, change the settings on the headnode, so it can respond to all requests.

    the  technet link below, talks about  both options.

    http://technet.microsoft.com/en-us/library/ff919449(WS.10).aspx

     


    pm
    Tuesday, October 19, 2010 7:07 PM
    Moderator

All replies

  • I have exactly the same problem.  I do have firewall on, but turning it off had no effect.  MSFT consultants have not been able to fix the problem.  You can't believe how many hours have been spent trying to get this thing going.  I only can run jobs against the HN!
    Tuesday, October 5, 2010 12:41 PM
  • When you try to add a node, some message in the provisioning log at cluster manager interface show up? When you remote desktop to these nodes and try to ping headnode, are there responses?
    Tuesday, October 5, 2010 12:44 PM
  • Hi, mlinsgomes.  On HN.  I brought WN offline and then deleted it.  Now, when I try to Add it using cluster mgr running on HN, it cannot find the configured WN node I just deleted.  So, I went to (former) WN machine and tried to add the node there.  That worked.  Is this normal?  Shouldn't the HN find a configured WN?  The reason I ask is that I've not been able to get other machines running either as CNs or WNs to work.  Only can run jobs against HN.  Using Excel 2010 running HPC I consistently get "Denied Access."  However, with all nodes on-line, all diagnostics pass.

    Tuesday, October 5, 2010 2:04 PM
  • Well, i thought that some configuration at WN take a time longer that Cluster Manager timeout, but when I added WN here in my configuration, they worked fine. I did this from bare metal with a template that deploy a OS image and did instalation using PXE protocol. And the HN can see a configured WN because when some WN goes down or is rebooted they can restablish communications back.

    About your job, you configured your HN to be a WN also?

    Tuesday, October 5, 2010 2:15 PM
  • We are having a similar problem. We believe our problem is in the DHCP setup. W have multiple computers and ready to go from bare metal. Right now I wold be happy just to join 2 systems! We have tried man configurations, bought different NICs, but the head node doens not see our planned compute nodes during PXE boot.

    A little of subject for a second.... We assumed that we could run any windows supported program like 3DS Max, but I am starting to doubt that assumption. When we get this thing running, will it let us cluster resources (CPUs and RAM) and let use put our application on the joined resources and render?

    Tuesday, October 5, 2010 2:46 PM
  • I did this again.  I deleted/added WN from HN.  It took a very long time (10 min+), but eventually HN found the configured, but unattached WN.  And, I'm only taking about a search within 10 or so machines of which 4 are HPC nodes!
    Tuesday, October 5, 2010 7:37 PM
  • the headnode is setup by default to not respond to pXE requests from unknown nodes.

    So, if you have not added the compute nodes to the head node- via node xml( where each node is identified by the its MAC address), then what you are seeing is the default behavior.

    there are two things you could do here

    a) create and import a nodexml for the compute node -- and then powerthem on for a bare-metal deployment

    b) if you are sure that the network over which the deployment is to occur, is private, change the settings on the headnode, so it can respond to all requests.

    the  technet link below, talks about  both options.

    http://technet.microsoft.com/en-us/library/ff919449(WS.10).aspx

     


    pm
    Tuesday, October 19, 2010 7:07 PM
    Moderator