Problem with Crashing headnode when deploying compute nodes


  • Hi

    I seem to be having a problem when trying to deploy compute nodes. I ave been successful in installing 1 node.

    The problem seems to turn up after the image has been copied across and the compute node tries to access the head node and fails.  The Head node crashes after that (not th OS just CCP) evry time I try to start the cluster manager I get unable to connect and I need to re install the head node.

    NOTE - I am running the head node under VMWARE. everything else seems to work  fine. and I was able install 1 node. The problem only occured when I tried to install the other 3 at the same time. They are consitantly stuck in the provisioning state.

    I am going to try un installng the CCP and re installing and see if that make any difference

    2008年7月10日 2:28



  • To add to this. I have uninstalled the HPC pack and the SQL server. did a re install and again. First node installs well, second node fails (and third and fourth - tried in serial mode). I had to run startnet again on the compute node machine to get it to work - strange !

    I have tried this now on 5 different compute nodes.  VMWARE is probably causing some network latency/loss ? which causes startnet (part of the bare metal install) fails on some network connection. whe  restart it everything is okay.  Maybe there needs to be more resilince in the build process ??

    • 編集済み AlexSamad 2008年7月11日 6:06 update
    2008年7月11日 3:52

    HPC Server 2008 shipped in September 2008, so I'm going through and marking all questions in the beta forum as 'answered'.

    2009年3月25日 23:52