29 mai 2008 09:07Hi,
I'm trying to deploy bare metal nodes. I'm running W2008 Ent X64 + HPC server beta 2 on the head node. My compute nodes are booting and start WinPE. However, it's always failing with :
"The system cannot find the file X:\windows\nodename.hpc."
the X drive seems properly mounted and contain all windows installation files. I've added the network driver (bxnd60a.sys for Broadcom adapter) with no success.
My head node and compute nodes are bl460c.
11 iunie 2008 19:19
We believe this is symptomatic of a failed network driver load when WinPE is first booted on the compute node. You can either scroll back the text in startnet.cmd's window to verify that or view x:\windows\executionclient.log with notepad. The log should show which drivers were loaded and whether they were successfully initialized.
If no network drivers were loaded, use the 'Manage Drivers (all images)' action under the Images item in the Configuration navigation pane to add drivers to the WinPE image. If the drivers are present but failed to load, then you will need to find drivers that are compatible with WinPE for your particular hardware.
Detailed explanation of the error:
When startnet.cmd is started on the WinPE boot, it enumerates the subdirectories of x:\ccpdrivers and loads each driver that was bound into the WinPE image via the "Manage Drivers" option mentioned above.
After that, it runs ExecutionClient.exe which is the HPC management service's agent on the compute node. EC expects that network connectivity exists at this point so if it fails to open a port due to missing network drivers, it will eventually time out and exit with the appropriate error code.
Detecting a failure in EC, startnet attempts to copy error logs and any minidump files for EC to the \\<HPCClusterName>\CcpSpoolDir\Deployment\<ComputeNodeName> directory. startnet tries to extract the compute node's name from nodename.hpc which was written by EC if it successfully opened a port and received the name from the mgmt service. In this case, the file wasn't written due to an earlier error causing the error you saw displayed in the cmd window.
The error with startnet.cnd is that it should check whether nodename.hpc exists before attempting to use it which I will fix for RC1. However, even if the nodename file existed, writing files to the directory would have failed since the compute node does not have an identity (machine account) known to the head node and the underlying disk directory that CcpSpoolDir maps to is protected. You need to grant read/write access to everyone in order to capture logs from failed compute node imaging attempts. Once you have finished imaging, I would recommend removing the Everyone:RW setting from the security descriptor so as to avoid security issues.
12 iunie 2008 00:24
The error you see is benign: startnet goes on to do the right thing which is create a directory called UNKNOWNX where X is a number. The EC log unfortunately does not include the name of the node that failed; but that wouldn't do you much good since the compute node hasn't been imaged yet. The EC log will contain the machine GUID so that might help you identify which node is failing. If all nodes are failing, you have a problem that is common to all nodes, like a missing driver.
When I removed my drivers from "Manage Drivers" and tried to image a node, I get the following:Code Snippet
Initializing Network Interfaces...
The command completed successfully.
Contacting CommandServer on HeadNode HPCHN2
**** Constructing connection ****
**** Initializing Crypto Provider ****
**** Crypto provider not found, creating ****
**** Running in WinPE ****
**** Retrieving Guid and MAC info ****
**** SMBIOS Version 2.4 ****
**** Guid retreived: <machine guid> ****
**** Discovering Adapters ****
**** ERROR: No Network Adapters found. This may be caused by missing or improper network drivers. ****
**** Try adding or removing network drivers in the Admin Console. ****
The system cannot find the file X:\Windows\nodename.hpc
The network path was not found
Invalid drive specification
0 File(s) copied
If this isn't your problem, then dig through x:\windows\executionclient.log and let us know what the first error is. If you can get the log off the failed node (yes, hard to do if it has no network), then send it to us so we can figure out what might have happened.
- Propus ca răspuns de CDub3 24 iunie 2008 00:19
19 iunie 2008 01:28
These are the drivers I have loaded on my Bl460's (&Bl465's)
Should be able to find them on the HP site
8 octombrie 2008 15:08I am experiencing a similar problem with Dell Power Edge 1950. Were you able to get this one resolved.Rangam