Problem with Crashing headnode when deploying compute nodes


  • Hi

    I seem to be having a problem when trying to deploy compute nodes. I ave been successful in installing 1 node.

    The problem seems to turn up after the image has been copied across and the compute node tries to access the head node and fails.  The Head node crashes after that (not th OS just CCP) evry time I try to start the cluster manager I get unable to connect and I need to re install the head node.

    NOTE - I am running the head node under VMWARE. everything else seems to work  fine. and I was able install 1 node. The problem only occured when I tried to install the other 3 at the same time. They are consitantly stuck in the provisioning state.

    I am going to try un installng the CCP and re installing and see if that make any difference

    giovedì 10 luglio 2008 02:28


Tutte le risposte

  • To add to this. I have uninstalled the HPC pack and the SQL server. did a re install and again. First node installs well, second node fails (and third and fourth - tried in serial mode). I had to run startnet again on the compute node machine to get it to work - strange !

    I have tried this now on 5 different compute nodes.  VMWARE is probably causing some network latency/loss ? which causes startnet (part of the bare metal install) fails on some network connection. whe  restart it everything is okay.  Maybe there needs to be more resilince in the build process ??

    • Modificato AlexSamad venerdì 11 luglio 2008 06:06 update
    venerdì 11 luglio 2008 03:52

    HPC Server 2008 shipped in September 2008, so I'm going through and marking all questions in the beta forum as 'answered'.

    mercoledì 25 marzo 2009 23:52