I seem to be having a problem when trying to deploy compute nodes. I ave been successful in installing 1 node.
The problem seems to turn up after the image has been copied across and the compute node tries to access the head node and fails. The Head node crashes after that (not th OS just CCP) evry time I try to start the cluster manager I get unable to connect and I need to re install the head node.
NOTE - I am running the head node under VMWARE. everything else seems to work fine. and I was able install 1 node. The problem only occured when I tried to install the other 3 at the same time. They are consitantly stuck in the provisioning state.
I am going to try un installng the CCP and re installing and see if that make any difference
To add to this. I have uninstalled the HPC pack and the SQL server. did a re install and again. First node installs well, second node fails (and third and fourth - tried in serial mode). I had to run startnet again on the compute node machine to get it to work - strange !
I have tried this now on 5 different compute nodes. VMWARE is probably causing some network latency/loss ? which causes startnet (part of the bare metal install) fails on some network connection. whe restart it everything is okay. Maybe there needs to be more resilince in the build process ??
- Edited by AlexSamad Friday, July 11, 2008 6:06 AM update