none
Ensure GUID matches SMBIOS GUID - Deployment issue

    Question

  • Dear All,

    I get this error message when i try to deploy nodes :

    -------------------------------------------------------------------

    **** Sending Initial Start Flag ****

    **** Termination flag received from server ****

    **** If no actions completed, this may indicate a problem identifying the node ****

    **** Ensure the Machine GUID for this node in HPC Cluster Manager, matches the SMBIOS GUID in this log ****

    **** Mismatches may result from faulty BIOS revisions ****

    **** Shutting down connection ****

    The system cannot find the file X:\Windows\nodename.hpc.

    Access is denied

    Invalid drive specification

    -----------------------------------------------------------------------

    I add nodes to the hpc cluster manager by using node XML files (i use the "NodeXMLEditor_v6.xlsb" excel template, where i specify node hostname, mac address of the nic i pxe boot with, Management IP & Domain)

    I am currently setting up my deployment templates, so i do re-image the nodes very often at the moment.

    This works fine 2-3 times, then i get the above error message.

    I can work around the issue by removing the node from the cluster, and re-add it by using the template xml file, then i can redeploy the node again, but the same issue comes back after 2-3 redeployments.

    I have also tried removing the GUID checks by adding this registry entry on the head node, but it has not helped.

    HKLM\SOFTWARE\Microsoft\HPC\DisableSMBIOS type DWORD value 1

    Any input would be very appriciated.

    Br

    Patric

     

    Monday, March 07, 2011 8:32 AM

All replies

  • what hardware problem? and driver problem?

    but deploying succeessful.

    why not  guid,mac altenative select deploy?

    • Proposed as answer by bradelee Tuesday, March 08, 2011 2:17 AM
    • Unproposed as answer by Don PatteeModerator Friday, March 25, 2011 10:51 PM
    Tuesday, March 08, 2011 2:06 AM
  • Hi,

     

    Sorry i dont quite understand what you are asking?

    I am guessing you want to know what hardware and software i am using?

    HP Bl460C G1 blade servers (they have the infamous BCM5708S Nic) , i have tried adding the Monolithic driver provided by broadcom, but this only seems to apply to server 2003 and below (and the funny thing is it works fine 90% of the time, until i have to remove the node and re-add it)

    I am deploying HPC 2008 R2

    i am not sure what you mean by "why not  guid,mac altenative select deploy?" ?

    Br

    Patric

     

    Tuesday, March 08, 2011 8:36 AM
  • hpc2008 guid correctly during installation will not recognize if I do not have installed.
    In this case such as Linux, if you have not seen the error.
    mac address also can fully control them unless you put a GUID if I can not recognize the need to make ridiculously be expected evendistribution?
    Mac address and GUID, so I could make sure to use only one of two things?

     

    Thursday, March 24, 2011 12:46 AM
  • Hi Patric,

    At what stage of deployment are you seeing this message?  Can you provide the full provisioning log and/or full executionclient log?

    It definitely seems odd that it works 2-3 times before you see this message.  How often are you re-deploying your nodes?

    --Brian

    Saturday, March 26, 2011 12:02 AM
  • Hi Brian,

    It is at the very first stage of deployment, just before it is supposed to start copying the diskpart config file, the only step before that is to create an account in AD, then it boots the node into pxe , and that is when i see the message, and yes it can work fine 2-3 times, then fail, then work 2-3 times and again fail. (then removing the node and reimporting the node xml file enables me to redploy again)

    I am currently setting up around 10 different images/templates so i am redeploying nodes very frequently at the moment.

    Bradlee, yes i would love to skip the guid part and only use mac address, that is why i have tried to disable the guid checks as stated in the first post (i do not enter a guid into my node xml file either)

    i will edit this post once i get a failure again, and post any logs i can find.

    BR

    PG

     

    Tuesday, March 29, 2011 8:31 AM
  • There is a known case where some IBM hardware has inconsistent symbios GUIDS and the Machine GUID reported by the execution client.

    When you set the registry with HKLM\SOFTWARE\Microsoft\HPC\DisableSMBIOS type DWORD value 1, did you also remove the GUIDs from the Node XML when importing the nodes?

    I would suggest that you edit the Node XML and remove the GUIDs if you haven't already. 

    The supported cases for the Node XML are

    - Name + MAC Address for Private Network NIC

    - Name + GUID

    - Name + GUID + 1 or More MAC addresses

    if you don't have the GUID and have more than 1 MAC address - this is where you need to use the registry setting.

     

    Hope this helps,

    Mark

     

    • Proposed as answer by Mark Staveley Thursday, March 31, 2011 4:23 PM
    • Unproposed as answer by Mark Staveley Wednesday, May 04, 2011 1:38 AM
    • Proposed as answer by Mark Staveley Wednesday, November 09, 2011 1:56 AM
    Tuesday, March 29, 2011 8:34 PM
  • Hi Mark,

    I still have the registry entry enabled on the headnode.

    I have never used guid in the node xml files

    What i enter into the node xml file are :

    Hostname + Domain + ManagementIpAdress + MacAddress1  (where i assumed managementipaddress & macaddress1 are for the private network)

    Is the above details ok to use in the xml file? or am i entering two much info or the incorrect info?

    Br

    Patric

     

    Wednesday, March 30, 2011 8:13 AM
  • I would just try the name and the MacAddress (for the private network adapter). and reset the registry entry.

    Mark

     

    Wednesday, March 30, 2011 5:41 PM
  • Hi Mark,

    Ok thanks i will give that a try!

    Br

    Patric

     

    Wednesday, April 06, 2011 12:06 PM
  • Hi,

    Same issue when using just the mac+hostname with the registry entry removed.

    I guess i will just have to continue removing and re-adding the hosts when this issue happens for now, unless there are any other ideas out there?

    Br

    /Patric

     

    Tuesday, April 19, 2011 1:30 PM
  • The other things I would check are

     

    1) bios version on the head node and the compute node

    2) any firmware updates for network adapters

    3) that you have the latest and most up to date drivers (and add these drivers to not only the head node but to the "add drivers" section on the to-do-list)

     

    It is possible that when PXE booting one GUID is being returned to the Head Node and then once the node is up and running WinPE that the GUID being obtained through WinPE does not match that obtained in this environment.

    When the machine PXE boots can you somehow write down or copy the GUID that is displayed with the MAC address and then check this against the GUID that is displayed within the WinPE Environment.

     

    Mark

     

    Tuesday, April 19, 2011 3:58 PM
  • You could also try the name and the MacAddress (for the private network adapter) and have the registry key to ignore the GUIDs
    Tuesday, April 19, 2011 7:33 PM
  • Hi,

    Yes i seem to get a different GUID at pxe boot init and winpe :

    at PXE boot dhcp request :         34343737 3037 4742 3838 313145425946

    in WinPE when discovering nics : 37373434 3730 4247 3838 313145425946

    But at the time i wrote these numbers down, the machine deployment worked ok.

    i will try latest drivers & bios for this machine and check if that helps.

    Br

    Patric

     

    Tuesday, April 26, 2011 10:40 AM
  • Are you saying that the deployment is working fine now with the following configuration

     

    Registry Key + Name and Private Mac Address (No GUID) in the Node XML?

    Tuesday, April 26, 2011 3:59 PM
  • Hi Mark,

    No unfortunately not,

    Just that deployment worked that specific time that i wrote down the guid numbers (i.e they are different, but at that specific time even though pxe & winpe are detecting different guids deployment worked)

    /PG

    Wednesday, April 27, 2011 7:03 AM
  • The registry key should override the checking of the guids.

    Is there a time skew difference between the time that is being set in WinPE and the Head Node?

    Could you also please configm as to what stage the deployment is failing.

    1) Adding Node To Cluster either through PXE boot request or node XML

    2) Head Node Responds to PXE request and the node boots into WinPE (you will see the boot.wim being copied if you are looking at the Compute Node)

    3) in the WinPE Environment you should see

       i) enabling network

       ii) mounting partition on head node

       iii) copying disk partition parameter file

       iv) running disk partition

       v) either multicasting or unicasting the OS

     

    Are you making it nast this point? or are you getting stuck between i) and ii) ?

    Also - can you try this with a fresh machine - there may be residual entries relating to the machine you are using (if you are using it over and over again for testing)

    Thanks,

    Mark

     

     

    Wednesday, April 27, 2011 4:36 PM
  • Hi,

    It is between i) and ii)

    Here is the text copied directly from a failed deployment this morning :

    Initializing Network Interfaces...

     

    The command completed successfully.

    Contacting CommandServer on HeadNode CLUSTHN201

    **** Constructing connection ****

    **** Initializing Crypto Provider ****

    **** Crypto provider not found, creating ****

    **** Running in WinPE ****

    **** Retrieving Guid and MAC info ****

    **** SMBIOS Version 2.4 ****

    **** Guid retrieved: 37373434-3730-4247-3838-313145425946 ****

    **** Discovering Adapters ****

    Broadcom BCM5708S NetXtreme II GigE (NDIS VBD Client) #4

    Broadcom BCM5708S NetXtreme II GigE (NDIS VBD Client) #3

    Broadcom BCM5708S NetXtreme II GigE (NDIS VBD Client) #2

    Broadcom BCM5708S NetXtreme II GigE (NDIS VBD Client)

    **** Mac Retrieved: 00:1E:0B:5D:B0:56X00:1E:0B:EC:5C:50X00:1E:0B:EC:5C:52X00:1E:

    0B:5D:B0:48X ****

    **** Initializing Server Proxy ****

    **** Connecting to host CLUSTHN201 ****

    **** DNS Resolution ****

    **** Using IP address 192.168.0.201 ****

    **** Build Socket ****

    **** Connect ****

    **** Connection Success ****

    **** Initialization Complete! ****

    **** Sending initial start flag ****

    **** Termination flag received from server ****

    **** If no actions completed, this may indicate a problem identifying the node *

    ***

    **** Ensure the Machine GUID for this node in HPC Cluster Manager, matches the S

    MBIOS GUID in this log ****

    **** Mismatches may result from faulty BIOS revisions ****

    **** Shutting down connection ****

    The system cannot find the file X:\Windows\nodename.hpc.

    Access is denied.

    Invalid drive specification

    0 File(s) copied

     

    X:\Windows\system32>

     

    All my nodes are allready imported into the headnode by using the node xml import function (i.e i have no fresh servers to try on), but as i mentioned earlier, no guid:s have ever been entered into the xml files.

    Is there a known location in the sql database that the node guid is saved? perhaps my only option is to clear the data manually from the database?

     

    The failure above is with :

     

    1 Only hostname & Mac used in xml file

    2 GUID checking disabled on the headnode by registry entry

    3 Latest firmware available on Server & NIC

    BR

    Patric

     


    Friday, April 29, 2011 8:27 AM
  • can you confirm that you rebooted the head node or restarted the management service after adding the registry entry to disable the GUID checking.
    Friday, April 29, 2011 7:55 PM
  • Hi,

    Yes, confirmed, i have rebooted the headnode many times after adding the registry entry.

    Br
    Patric

     

    Monday, May 02, 2011 8:45 AM
  • Dear Patric,

     

     I'm not sure why this isn't working.  It seems very strange that you can have it work 2-3 times and then not work again for no apparent reason.  One question I wasn't sure about from your posting is the architecture of the machines and OS that you are using?

     

     Can you please confirm that you are using a 64-bit version of the Operating System and that your machines are 64-bit.

     

     Also to help with diagnosis, if you could do the following things they would be helpful

     

    1) Change the Logging Level for HPC -> regedit HKLM -> Software -> Microsoft -> HPC -> TraceLevel (change this to 4)

    2) Enable Logs in Event Viewer - look under Applications and Services -> Microsoft -> HPC - right click on the different folders and enable the logs

     

     Try a few deployments and see if anything is in the logs for your setup.

     You also mentioned something above about a monolithic driver, are you 100% sure this is the correct driver to be adding?  Do you get this same issue if you remove that driver and just use the in-box drivers?

     

    Thanks,

    Mark

    Monday, May 02, 2011 8:18 PM
  • Hi Mark,

    Yes it is quite strange, normally you have a problem and are stuck at some point, these intermittent issues are usually hard to fix :(

    Can you please confirm that you are using a 64-bit version of the Operating System and that your machines are 64-bit.

    Yes, i am using HP Blade BL460C G1 servers (http://vb.net/products/HP/12518_div.html) they have two Xeon 5365 cpus: (http://ark.intel.com/Product.aspx?id=30702)

    And i am deploying HPC Edition 2008 R2 , which i beleive is only available in x64 version (Headnode has the same version of Windows)

    You also mentioned something above about a monolithic driver, are you 100% sure this is the correct driver to be adding? 

    No i think that was a mistake by me, this driver was for the Older WDS / RIS setup, and i have since removed this driver, and added the latest nic driver from HP Instead (i have changed topology to 5 and back to rebuild the wds boot image since removing the driver just to be sure it has been removed properly)

    1) Change the Logging Level for HPC -> regedit HKLM -> Software -> Microsoft -> HPC -> TraceLevel (change this to 4)

    2) Enable Logs in Event Viewer - look under Applications and Services -> Microsoft -> HPC - right click on the different folders and enable the logs

    I will enable this and post back here with the results.

    Br

    Patric

     

     

    Tuesday, May 03, 2011 7:28 AM
  • Hi again,

    There seems to be nothing logged into the eventlog for the deployment failure until i cancel the deployment manually (i.e when i get the smbios mismatch error on the server, the deployment scheduler still thinks deployment is ongoing, and it is only on the node console i can see the error)

    Then i basically get a message saying that the deployment failed because it was cancelled by an admin.

    Are there any other logfiles i can check? x:\windows\ExecutionClient.log = only show the exact same message i posted a few posts up.

    Also, the file "X:\Windows\nodename.hpc" is not actually there, not sure how that works, but there is no file called *.hpc anywhere on the x: drive

    Br

    Patric

     

     


    Tuesday, May 03, 2011 8:43 AM
  • When you change the TraceLevel setting in the registry, the HPC management service will generate verbose logging to an ETL log file.  If you are not running Service Pack 1 of HPC Pack 2008 R2 then I would recommend running the patch from http://www.microsoft.com/downloads/en/details.aspx?FamilyID=aa86c849-15c0-4f97-a7b4-a23a2ba37890&displaylang=en.

    After SP1 is installed, you can convert the ETL log to a text file by running the following command from an elevated cmd prompt:  "HpcTrace getlog mgmt HpcManagementLog.txt"

    It might also be a good idea to confirm that your cluster hardware has the latest available BIOS and firmware updates from HP. 

    --Brian

     

    Tuesday, May 03, 2011 8:54 PM
  • Hi,

    It might also be a good idea to confirm that your cluster hardware has the latest available BIOS and firmware updates from HP. 

    I have the latest firmware on everything in this enclosure that i am currently trying the deoplyment on, i.e OA,ILO,Server,Disks,Array controllers , flashed a couple of weeks ago (including the headnode)

    If you are not running Service Pack 1 of HPC Pack 2008 R2 then I would recommend running the patch from

    I thought i had SP1, but apparently not, this patch is now installed,  and verified that the headnode now has version 3.1.xxx.x

    HpcTrace getlog mgmt HpcManagementLog.txt

    Does generate the HpcManagementLog.txt file, however it is 0 bytes and has no lines in it (even though in powershell i get "Processed 2 lines in 0,002 seconds"

    I have run the deployment again after installing SP1, and it failed on the first deployment, but nothing showed up in the log.

     

    Then i removed the node from the headnode again, and tried a deployment, this time it worked ok, and now i can see 1 line for the earlier failed deployment :

    2011-05-04 09:52:03 [Error][5][HpcManagement]  The operation 'Assigning template TEST to node NODE001'  failed to run correctly. The operation was initiated by the user: rdpdeploy. The operation can be identified by the GUID: 056d14fc-8714-4b2c-87ab-1a17f992f2ac. Using this GUID a log of the operation can be obtained from the HPC PowerShell command: Get-HpcOperation -id 056d14fc-8714-4b2c-87ab-1a17f992f2ac | Get-HpcOperationLog

     

    If I then run :

    Get-HpcOperation -id 056d14fc-8714-4b2c-87ab-1a17f992f2ac | Get-HpcOperationLog

     

    I get the following :

     

    Message                                  TimeCreated               Severity

    -------                                  -----------               --------

    Assigning template TEST to node NODE001... 2011-05-04 09:25:15       Information

    Moving node MYDOMAIN\NODE001 from state... 2011-05-04 09:25:15       Information

    Associating template TEST with node S... 2011-05-04 09:25:15       Information

    Restarting node NODE001               2011-05-04 09:25:15       Information

    Executing command "C:\Program Files\M... 2011-05-04 09:25:15       Information

    Waiting for the node to reboot           2011-05-04 09:25:20       Information

    Initiating provisioning operations fo... 2011-05-04 09:25:21       Information

    Connecting to DC: MYDOMAIN.COM         2011-05-04 09:25:21       Information

    Searching for an existing account in ... 2011-05-04 09:25:21       Information

    Found an existing account in Active D... 2011-05-04 09:25:21       Information

    Initiating configuration operations f... 2011-05-04 09:25:21       Information

    Waiting for node to boot into WINPE      2011-05-04 09:25:21       Information

    Sending PXE command to boot node to W... 2011-05-04 09:26:58       Information

    Sending PXE command to boot node to W... 2011-05-04 09:27:02       Information

    The administrator cancelled the opera... 2011-05-04 09:52:02       Warning

    The parent operation is being rolled ... 2011-05-04 09:52:02       Warning

    Disassociating template from node NODE... 2011-05-04 09:52:02       Information

    Reverted                                 2011-05-04 09:52:03       Information

     

    I.e I manually cancel the deployment operation when I get the guid mismatch, as the scheduler still thinks the node is deploying even when it has stopped at the guid mismatch stage.

     

    Br

    Patric

     

    Wednesday, May 04, 2011 8:49 AM
  • Hi,

    Is there another logfile or procedure i can use to try and narrow down what could be causing this?

    I have looked through the tables & data in the database, but have not found any obvious guid info i can delete.

    BR

    Patric

     

    Wednesday, May 11, 2011 6:54 AM
  • Another option would be to get some kind of a netmon trace as to what exactly is being sent and recieved by the Head Node when the deployments are failing.  That would be helpful if you had that kind of information.

    Wednesday, May 11, 2011 4:19 PM
  • Hi Patric,

    We might be able to provide more insight into the problem if you could provide an example of a <Node> entry from your nodes.XML file.

    For example, if your nodes.xml includes <MacAddress> elements other than your private network (which is the minimum required), and WinPE doesn't see all of those MAC addresses (perhaps because you haven't injected a driver for all interfaces), then it's possible that the management service will not completely "match" your node with the information you've provided.  The solution is to either trim your nodes.xml file of extraneous MacAddress information or make sure that WinPE can confirm the MAC addresses of all interfaces that you have in your nodes.xml file.

    Thanks,
    --Brian




    • Edited by Brian Broker Monday, June 20, 2011 9:28 AM add clarification
    Monday, June 20, 2011 8:54 AM
  • Hi,

    I have reinstalled my headnode last week, just incase there were some odd data in the database like mac addresses from a another nic, however i still get the same issue.

    Since the reinstall, i have only added two nodes, and then i added them one by one using separete xml files.

    Here is a copy of the xml file used to import one of the nodes :

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>

    <Nodes xmlns="http://schemas.microsoft.com/HpcNodeConfigurationFile/2007/12">

    <Node Name="steldsw303">

    <Location />

    <Template Name="" />

    <MacAddress>78e3b50a2760</MacAddress>

    </Node>

    </Nodes>

    Br

    Patric

     

    Monday, June 20, 2011 2:25 PM
  • Hi Patric,

    You appear to be using a node deployment template defined as a string with no name.  (aka. an empty string)

    Why?

    A node element in your nodes.xml file might look like this:

      <Node Name="steldsw303">

          <MacAddress>78e3b50a2760</MacAddress>

      </Node>

    Regards,
    --Brian

    Wednesday, August 10, 2011 8:20 AM
  • Hi Brian,

     

    Sorry i don't quite understand what you mean? , is there a field that i have left empty that should be populated? , i basically just enter details into the fields in the excel template and then save it

    BR

    Patric

     

    Tuesday, August 16, 2011 8:09 AM
  • In the Node XML example you provide - you list a blank Node XML Deployment Template - All Brian is saying is that it is simpler to remove the element entirely.  Was there any particular reason for keeping the empty XML element in your template?

     

    Also are you still seeing this issue?

    Thursday, October 20, 2011 6:44 PM
  • Hi

    As stated in an earlier post, i basically launch the Excel node template and enter details into 3 fields :

    1 Hostname 

    2 ManagementIpAdress

    3 MacAddress1 

    Then save the template, and then that is the output you can see above, i don't know why the excel template still outputs the  <Template Name="" /> field as i do not even enter any data into that cell, are you saying i should manually edit the xml file after saving it and remove the template name line?

    Anyway, i think this is an issue with BL460C G1 servers, i have since the last post received some new BL460c G7 servers, and here i do not see the issue.

    So it wont be an issue for me once i have replaced all my 120 G1 servers.

    So Yes and No :) , still an issue with G1 servers, but as i am replacing my servers i wont personally need this thread to remain open.

    Br

    Patric

     

    Friday, October 21, 2011 6:53 AM
  • Ahh.. so I'm seeing this issue with IBM Blades.  I'm new to the HPC Pack, we're running HPC Pack 2008 R2 SP2.  I'm getting the error message "ensure the Machine GUID for this node in HPC Cluster Manager, matches the SMBIOS GUID in this log"  I'm working on the registry setting now to disable SMBIOS and will try that.  Where is the XML file that I'm suppose to edit?  That didn't jump out at me yet.

    Thanks for any help you can offer.

    Thursday, November 03, 2011 7:22 PM
  • HKLM\SOFTWARE\Microsoft\HPC\DisableSMBIOS type DWORD value 1

     

    This worked for me.  It continued on to the image, and is in mid-copy now.  Thanks!

    Thursday, November 03, 2011 7:34 PM