none
N00b trying to build small cluster but nodes fail during provisioning RRS feed

  • Question

  • I downloaded the 180-day trial of Windows HPC Server 2008 and I'm trying to get it going on two identical computers. The only difference is that the compute node has no DVD drive. I'm trying to add the nodes from bare metal. I created a node template which includes all the drivers for the motherboard and an image created from the DVD. A few strange things have transpired. First, when I added the node for the first time everything seemed to work fine but then I got an error on the node saying that the OS was unable to start - something about hardware may have changed, restart the computer with the disk, etc. I thought that was odd so I tried again. The same thing happened. NOW when I add the node the node naming does not match and it hangs waiting for Admin approval (Name in admin approve dialog box is different than what node reports during provisioning initialization). I've restarted the node naming on the head node but the node itself reports a higher number every time i retry (node02, then node03, then node04, etc.)

    How can I get the node installed?
    Tuesday, March 10, 2009 1:01 PM

Answers

  • you can go back t the to do list and re-set the naming series to start from whatever number you you'd like to start, this shoudl take care of the name of the compute node.
    I have a couple questions about your node deployment issue though.
    a) once the deployment has failed, do you delete the failed node entry?
    b) do you apply the same node template again?
    c) when you look at the provisioning logs ( go to the node management pane, select the 'provisioning' node, in the bottom half of the center pane, you will see  one tab for 'provisioning logs') what do theysay the deployment error was?
    d) First, when I added the node for the first time everything seemed to work fine but then I got an error on the node saying that the OS was unable to start - something about hardware may have changed, restart the computer with the disk, etc.
    where did you see this error? on the compute node when you tried to log on to it? or in the logs ?


     thanks
    -parmita
    pm
    Thursday, March 26, 2009 5:54 PM
    Moderator

All replies

  • you can go back t the to do list and re-set the naming series to start from whatever number you you'd like to start, this shoudl take care of the name of the compute node.
    I have a couple questions about your node deployment issue though.
    a) once the deployment has failed, do you delete the failed node entry?
    b) do you apply the same node template again?
    c) when you look at the provisioning logs ( go to the node management pane, select the 'provisioning' node, in the bottom half of the center pane, you will see  one tab for 'provisioning logs') what do theysay the deployment error was?
    d) First, when I added the node for the first time everything seemed to work fine but then I got an error on the node saying that the OS was unable to start - something about hardware may have changed, restart the computer with the disk, etc.
    where did you see this error? on the compute node when you tried to log on to it? or in the logs ?


     thanks
    -parmita
    pm
    Thursday, March 26, 2009 5:54 PM
    Moderator
  • I did go back and restart the node naming and it seemed to work but the node name does not match the name on the head node. I did delete the failed node entry and i did apply the same node template again. (i know, don't do the same thing and expect a different result but I tried anyways). I'll look at the logs and give you more detailed answers when I get time tonight. Thank you for the help! I was beginning to think I was on my own!
    Thursday, March 26, 2009 6:02 PM