none
TFTP Download failed during PXE boot

    Question

  • Hi,

    I tried to deploy node from bare metal in HPC Cluster Manager. First I tried to boot using network adapter´s scanning ROM option and it worked once. I did not change any settings but the next time I restarted the compute node, it booted from hard disc instead of booting into PXE. Using Windows Network Monitor I discovered that the compute node does not send a DHCP request so I decided to use iPXE to boot from CD-ROM and I did receive a DHCP request. So far so good. I used HPC Cluster Manager´s  "Reimaging" option to have a look at "Provisioning Log". That told me that after Cluster Manager is "Waiting for node to boot into WINPE" it is "Sending PXE command to boot node to WINPE  (expected boot time: 5-15 minutes). Well, iPXE is connecting to the DHCP server correctly but then tells me that TFTP download fails. I started troubleshooting this issue by adding DHCP-server´s "066 Boot Server Host Name" option. I couldn´t believe that it worked. Again, not changing any settings, next time I reimaged the compute node, I received same "TFTP download failed" error. I have no idea why this error occurs.

    Windows Server 2012 R2 (installed)

    HPC Pack 2012 R2 (installed)

    DHCP Server (installed and configured using HPC Cluster Manager´s Network Configuration Wizard with 1.Topology)

    WDS´s Transport Server (installed as mentioned in HPC Cluster Manager´s Deployment:Windows Deployment Services Test)

    DNS Server (installed)

    AD DS (installed)

    Please tell me if u need any further information about configuration settings etc... 

    Thanks in advance...

    Bene

    Friday, December 19, 2014 3:41 PM

All replies

  • Hi Bene,

    It is as expected that the compute node booted from hard disk when you restarted the compute node, because the bare metal deployment is to burn the image you specified in the node template to your compute node. After the image was successfully burned, the compute node will boot from hard disk. Why you expected it still boot from WinPE?

    I am not sure about "Use iPXE to boot from CD-ROM", do you mean you boot from the WinPE that the CD-ROM provided? If so, it is not supported, the WinPE image(can be found at %CCP_DATA%boot\x64) in HPC is customized, you cannot use a standard WinPE for HPC bare metal deployment.

    Belows are some useful link for bare metal deployment:

    Deploy Nodes from Bare Metal

    Troubleshooting PXE Boot Failures During Baremetal Node Deployment

    Monday, December 22, 2014 3:37 AM
  • Hi Subin,

    thanks for your fast reply.

    How the compute node knows it should boot from hard disk instead, according to the BIOS settings, still "boot from network" or in my new case "boot from cd"?

    I tried to reimage the compute node. And as I did not receive a DHCP Discover from the compute node, I decided to use an iPXE image on cd to boot from. This iPXE provides me the possibility to specify which network adapter to use and as mentioned in my first input I got this DHCP Discover. The headnode offers, compute node requests and the headnode again acknowledges as expected.

    iPxe tells me:

    tftp://192.168.2.1/Boot\x64\WDSNBP.com... ok

     Downloaded WDSNBP from ....

     WDSNBP started using DHCP Referral

    Contacting Server: 192.168.2.1 (Gateway: 0.0.0.0).

    Architecture: x64

    Contacting Server: 192.168.2.1..

    TFTP Download: Boot\x64\WDSNBP.com

    ..........

    TFTP Download failed!!

    According to Event Managers WDS log TFTP download successfully finished (I guess the WDSNBP.com file).

    Windows Network Monitor tells me TFTP download failed according to access violation. (Error code 4).

    I really studied this links for bare metal deployment without success.

    Any other ideas :D

    Thanks in advance...

    Bene

    Monday, December 22, 2014 11:34 AM
  • The compute node will still boot from net adapter first, then the head node will tell the compute node to run abortpxe.com to abort PXE and boot to next device(i.e. hard disk).
    From the iPXE logs, the WDSNBP.com was successfully downloaded and started, then it shall proceed download pxeboot.n12, however, from the logs, seems it still tried to download WDSNBP.com from the same location and failed.
    Can you try again (to make the log clear, please just deploy one machine) and share the hpcmanagement logs to me? The hpcmanagement logs is at  %CCP_DATA%LogFiles\Management\HpcManagement_xxxxxx.bin, i need the one with second largest "xxxxxxx" number. For example, if HpcManagement_000009.bin is with largest number, please share me the file HpcManagment_000008.bin.
    And the belows may help to analyze the issue:
    1. The node template you used for bare metal deployment (In "Configuration->Node Templates", right click the node template and choose "export")
    2. A screen snapshot for "Configuration->Network"
    3. The WDS log.
    4. The net traffic capture between the compute node and head node.

    Please send to suzhu@microsoft.com

    Thanks,
    Sunbin

    • Edited by Sunbin Zhu Wednesday, December 24, 2014 2:38 AM
    Wednesday, December 24, 2014 2:36 AM