locked
WinPE won't load the network device driver during Windows 2008 HPC Server deployment, despite the presence of the driver RRS feed

  • Question

  • I've been banging my head against the wall on this one...

     

    My boot.wim got too big again, so I followed the instructions in the following KB article to shrink it:
    http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=fd55089b-36a8-4844-80a4-a8e0c6f9dc31
    This has happened before, and I thought that I had it patched, but the procedure is easy -- just turn the crank, and no big deal.

     

    Once I finished this procedure, WinPE boots again, but it refuses to load the device driver for my compute node's Ethernet driver.  I use the Cluster Manager to "Manage drivers" and remove/reload the driver.  I even updated the driver.

     

    When I get the compute node's console and run "wpeutil InitializeNetwork" and check with "ipconfig /all" until I get bored, but no luck.  The funny part is that the driver *is* included in the boot-image, and that I can manually load the driver with the following command:

    drvload X:\CCPDrivers\b06nd\b06nd.inf
    This causes ipconfig to show me the proper IP configuration as handed out by my DHCP server, and I can ping around with no problems.

     

    I'd understood that WinPE was supposed to walk through the .inf files in X:\CCPDrivers\ subtree and try them all.  However, it's clearly not doing that in this case.  Does anyone have any ideas on how poke it so that it does?

     

    Thanks,
    -Luke

    • Edited by Luke Scharf Thursday, July 22, 2010 3:24 PM More silly typos
    Wednesday, July 21, 2010 9:22 PM

Answers

  • Changing to Topology 5 and then back to your Topology 1 (or anything other than 5) acutally does just that - it rebuilds your wim and resets the WDS service - reinjects the drivers to the boot.wim and all that stuff.

    The reason it does this is that Topology 5 does not support bare metal deployment....

    Just to confirm you're using HPC Server 2008 and not HPC Server 2008 R2? Right?

    Mark

     

    • Marked as answer by Luke Scharf Wednesday, July 28, 2010 9:52 PM
    Wednesday, July 28, 2010 9:25 PM
  • In re-reading your post - looks like the switch to topology 5 and back worked.

    Please post again if you are still seeing issues.

    • Proposed as answer by Mark Staveley Wednesday, July 28, 2010 9:41 PM
    • Marked as answer by Luke Scharf Wednesday, July 28, 2010 9:51 PM
    Wednesday, July 28, 2010 9:41 PM

All replies

  • Bump.

     

    Does anyone have any ideas at all?

     

    Thanks,

    -Luke

    Tuesday, July 27, 2010 3:30 PM
  • Luke - there should be a file on the compute nodes c:\executionclient.log (or something like that)  .... would you be able to post the contents of that file?  In that file it will show what is happening during when the drivers are supposed to be loaded automatically.

     

    Mark

     

    Tuesday, July 27, 2010 5:52 PM
  • That log file doesn't exist, since the installation doesn't even get started.  The preboot environment comes up and drops me to a command-line windows.  The closest thing that I can find to an executionclient.log in the preboot environment is X:\Windows\system32\wpeinit.log:

     

    ��Info No unattend file was found; WPEINIT is using default settings to initialize WinPE
    Info Spent 7800ms initializing removable media before unattend search
    Info ==== Initializing Display Settings ====
    Info No display settings specified
    Info STATUS: SUCCESS (0x00000001)
    Info ==== Initializing Computer Name ====
    Info Generating a random computer name
    Info STATUS: SUCCESS (0x00000000)
    Info ==== Initializing Virtual Memory Paging File ====
    Info No WinPE page file setting specified
    Info STATUS: SUCCESS (0x00000001)
    Info ==== Initializing Optional Components ====
    Info STATUS: SUCCESS (0x00000000)
    Info ==== Initializing Network Access and Applying Configuration ====
    Info No EnableNetwork unattend setting was specified; the default action for this context is to enable networking support.
    Info Service dhcp stop: 0x00000000
    Info Service lmhosts stop: 0x00000000
    Info Service bfe stop: 0x00000000
    Info Service ikeext stop: 0x00000000
    Info Service mpssvc stop: 0x00000000
    Info Spent 156ms initializing security templates; status 0x00000000
    Info Install MS_MSCLIENT: 0x0004a020
    Info Install MS_NETBIOS: 0x0004a020
    Info Install MS_SMB: 0x0004a020
    Info Install MS_TCPIP6: 0x0004a020
    Info Install MS_TCPIP: 0x0004a020
    Info Spent 5850ms installing network components
    Info iSCSI: iBFT ACPI Table is not available on this system
    Info Spent 873ms installing network drivers
    Error QueryAdapterStatus: no adapters found.
    Info Spent 0ms confirming network initialization; status 0x80004005
    Info WaitForNetworkToInitialize failed; ignoring error
    Info STATUS: SUCCESS (0x003d0001)
    Info ==== Applying Firewall Settings ====
    Info STATUS: SUCCESS (0x00000001)
    Info ==== Executing Synchronous User-Provided Commands ====
    Info STATUS: SUCCESS (0x00000001)
    Info ==== Executing Asynchronous User-Provided Commands ====
    Info STATUS: SUCCESS (0x00000001)
    Info ==== Applying Shutdown Settings ====
    Info No shutdown setting was specified
    Info STATUS: SUCCESS (0x00000001)
    

    I can manually "drvload X:\CCPDrivers\b06nd\b06nd.inf" and then "net use Z: \\server\share" to copy the logs off in the preboot environment.  But I'm at a loss as to how to fix the preboot environment so that it really can run in an unattended mode -- and none of the wpeinit and wpeutil commands that I've tried have been able to get the installation started again, either.

     

    Thanks!!

    -Luke

    Tuesday, July 27, 2010 9:12 PM
  • Dear Luke,

    When your machines are being deployed from bare metal it is expected that they will boot from the private network into WINPE and continue on with the deployment...

    You should see a log file in the c:\ that keeps a record of the deployment - especially what happens through WINPE.

    Here is an example from a recent deployment (see below).  This same process should happen except there would be errors showing where it couldn't configure the network adapters (and the mounting of the share and copying of files subsequently cannot proceed).

    Just to be clear, you do have your machines configured correctly to boot from the right network adapter (same as your private network).  This whole process should be automated and you shouldn't have to worry about doing anything under WINPE - it is basically set it and forget it (unless you want to do customization in terms of the OS Unattend Installation and Disk Partitions).

    Could you also please confirm what topology you are trying to run (Topology 1?)

    Thanks,

    Mark

     

    -- Snippet of Sample WINPE Execution Client Log --

    07/27/10 11:26:32: **** Constructing connection ****
    07/27/10 11:26:32: **** Initializing Crypto Provider ****
    07/27/10 11:26:32: **** Crypto provider not found, creating ****
    07/27/10 11:26:32: **** Running in WinPE ****
    07/27/10 11:26:32: **** Retrieving Guid and MAC info ****
    07/27/10 11:26:32: **** SMBIOS Version 2.5 ****
    07/27/10 11:26:32: **** Guid retrieved: 08BC1D0D-84C8-11DE-BBDA-B30BE5A10025 ****
    07/27/10 11:26:32: **** Discovering Adapters ****
    07/27/10 11:26:32: Intel(R) 82567LM-3 Gigabit Network Connection

    07/27/10 11:26:32: **** Mac Retrieved: 00:25:B3:0B:E5:A1X ****
    07/27/10 11:26:32: **** Initializing Server Proxy ****
    07/27/10 11:26:32: **** Connecting to host <head node name> ****
    07/27/10 11:26:32: **** DNS Resolution ****
    07/27/10 11:26:32: **** Using IP address 192.168.0.1 ****
    07/27/10 11:26:32: **** Build Socket ****
    07/27/10 11:26:32: **** Connect ****
    07/27/10 11:26:32: **** Connection Success ****
    07/27/10 11:26:32: **** Initialization Complete! ****
    07/27/10 11:26:32: **** Sending initial start flag ****
    07/27/10 11:26:38: COMMAND: net use /delete z: & net use z: \\<head node name>\REMINST "*******" /user:"*******"

    The network connection could not be found.

     

    More help is available by typing NET HELPMSG 2250.

     

    The command completed successfully.

     

    07/27/10 11:26:40: **** Command execution finished, sending result 0 ****
    07/27/10 11:26:40: **** Result sent to server ****
    07/27/10 11:26:45: COMMAND: robocopy "Z:\\config" "x:\\\\" "diskpart.txt" /R:5 /W:5

     

    -------------------------------------------------------------------------------

       ROBOCOPY     ::     Robust File Copy for Windows                             

    -------------------------------------------------------------------------------

     

      Started : Tue Jul 27 11:26:45 2010

     

       Source : Z:\config\

         Dest : x:\

     

        Files : diskpart.txt

        

      Options : /COPY:DAT /R:5 /W:5

     

    ------------------------------------------------------------------------------

     

                        1 Z:\config\

         New File         124 diskpart.txt
      0% 
    100% 

     

    ------------------------------------------------------------------------------

    Wednesday, July 28, 2010 5:31 PM
  • The network topology is set to "Compute nodes isolated on a private network", which is a close enough approximation of our real network that it's worked up to this point.

     

    There is no C: or Z: on the client, and no diskpart.txt on the WDS client / compute node.  I imagine that the installation could proceed, if it were loading the network driver that I can load manually from the console.

     

    I just can't figure out how to get the client to run "drvload X:\CCPDrivers\b06nd\b06nd.inf" to bring up the network, so that it can contact the server and start grabbing things like diskpart.txt.  A working network driver is integrated in to the boot.wim (as evidenced by the fact that I can bring up the network through the compute-node's console), but it just seems to be ignored.

     

    Is there a collection of scripts that WinPE uses to direct the execution of the installer that I should be reading to learn how WDS and WinPE work?  Is there anything like an /etc/init.d/rc.sysinit in WinPE?

     

    Thanks again!
    -Luke

    Wednesday, July 28, 2010 5:55 PM
  • Dear Luke,

      This whole process should be automated with no need for "manual intervention".  What should be happening is that 1) your machine PXE boots 2) it will then send a TFTP request from the head node to start the WINPE environment and then it will 3) establish a connection to the install share on the head node and proceed with the installation from there (copying the OS and installing HPC pack).

      I was asking to see the execution client logs as those logs are created as soon as your machines are booted into WINPE when steps 1 and 2 are automated through HPC server. (even a screen shot or two put up somewhere of what is happening at that stage in your deployment would be useful).

      You mention that you have no C:\ or Z:\ on the client - the Z:\ is the mounted share that is created during our installation so it can get to the installation.  If there is no C:\ then there would be some customization in the disk partitioning scripts that you may have to do to get things working... but as you mention we aren't there yet as the drivers aren't loading when things are being done automatically.  The information I need to understand is what in the automation process is not loading these drivers?  When you watch the process you should see (through at KVM connection on the Compute Node) the node boot into WINPE and then start to load all the drivers that were sent down with it.

      Can you verify that the drivers are actually being included with the OS Image?  When you add drivers do you see a change in size of your boot.wim?  What happens if you switch to Topology 5 (complete the network wizard) and then switch back to Topology 1 - and try again?  Does that work?

      If you post an e-mail address I can get back to you in a more timely fashion.

    Mark

     

    Wednesday, July 28, 2010 6:52 PM
  • Is changing to Topology 5 (rather than, say, Topology 2) important?

     

    I tried it several times with Topology 2, and it didn't work -- but when I tried Topology 5, it worked.  (But my network private connection was renamed.)

     

    Maybe the next version of the HPC Cluster Manager should have a "rebuild my boot.WIM" button?

     

    Now I'm having trouble with the partitioning tool, but that usually goes away if I "dd if=/dev/zero of=/dev/sda" or diskpart.exe's "clean" command on the node...

     

    Thanks,

    -Luke

    • Edited by Luke Scharf Wednesday, July 28, 2010 9:51 PM
    Wednesday, July 28, 2010 9:21 PM
  • Changing to Topology 5 and then back to your Topology 1 (or anything other than 5) acutally does just that - it rebuilds your wim and resets the WDS service - reinjects the drivers to the boot.wim and all that stuff.

    The reason it does this is that Topology 5 does not support bare metal deployment....

    Just to confirm you're using HPC Server 2008 and not HPC Server 2008 R2? Right?

    Mark

     

    • Marked as answer by Luke Scharf Wednesday, July 28, 2010 9:52 PM
    Wednesday, July 28, 2010 9:25 PM
  • P.s. did you install the Fix for the growing boot.wim issue and the latest HPC Server 2008 Service pack?
    Wednesday, July 28, 2010 9:26 PM
  • In re-reading your post - looks like the switch to topology 5 and back worked.

    Please post again if you are still seeing issues.

    • Proposed as answer by Mark Staveley Wednesday, July 28, 2010 9:41 PM
    • Marked as answer by Luke Scharf Wednesday, July 28, 2010 9:51 PM
    Wednesday, July 28, 2010 9:41 PM
  • "P.s. did you install the Fix for the growing boot.wim issue and the latest HPC Server 2008 Service pack?"

     

    I thought I installed the patch, but the boot.wim was still growing.  I got the URL for the hotfix out of my notes from that incident, and followed the instructions.

     

    Also, if changing to Topology 2 was not sufficient (since it didn't delete/add the private network connection to the node image?), then the wording on the hotfix page may need to be refined.  I don't have a good way to reproduce the problem, though, and I can't just read the scripts/source-code the way I'm accustomed to in the Linux world -- so I can't really tell of there's a reason it should have been different when I tried Topology 5 instead of Topology 2.

    Wednesday, July 28, 2010 9:50 PM
  • Hi Luke -

     

    1) Here is a link to SP2 - it fixes the growing Boot.wim problem so after the Boot.wim grows and you repair it, it won't happen any more (http://blogs.technet.com/b/windowshpc/archive/2010/05/12/hpc-pack-2008-service-pack-2-sp2.aspx)

    2) The switching from Topology 5 and then back to the topology you want to use (1 through 4) is a way to kind of clean things up in terms of the boot.wim and drivers and the hostsfiles.

    3) Without seeing your cluster setup, my hands are tied in terms of diagnosing things further - although if there is anything more you want to post in terms of questions and such, I'll do my best to help.

     

    Mark 

     

    Wednesday, July 28, 2010 9:59 PM