none
Virtualizing head node

    Question

  • Hi - we previously had our head node as a physical machine, but a couple months ago we turned it into a virtual using disk2vhd. This seemed to be working fine but recently we had to migrate our entire HPC deployment to a new subnet.  For the most part things are working correctly still with the nodes, the problem we are seeing is that it takes way too long for tasks to get assigned out.  It takes about 15 minutes for a task to get assigned. 

    Looking closer at the network configuration of the head node, it still shows its old physical adapter for the NIC, and it also shows its old IP from the previous subnet.  If I try and "configure network", I get an error that "there are not enough online network adapters with IPv4 enabled to configure the network". 

    Is there a problem running the head node as a virtual?  Is there something different I need to do for a subnet change?  I can't find any documentation addressing these situations, am looking for some other ideas on what might be causing this.  Thank you.

    Monday, September 14, 2015 1:52 PM

All replies

  • What version of HPC Pack are you using? If virtualized in Hyper-V, we think it should be okay as some of our functional testing environment are virtualized VM. And if you have new NIC or your existing NIC requires change IP address, you have to re-run the network configuration in the todo list.

    Qiufang


    Qiufang Shi

    Tuesday, September 15, 2015 1:57 AM
  • Hi - thank you for the response.  We are using HPC Pack 2012 R2.  When I try and configure the network I get the error "there are not enough online network adapters with IPv4 enabled to configure the network".  I added another virtual NIC (so now it has two), but I still get the same error when trying to configure the network.

    The behavior we are seeing is that large jobs (2500+ tasks) don't get distributed out quickly.  It will take 15 minutes for each node to get assigned a task.  With large jobs, tasks seem to get distributed correctly.  This is what lead me to look into networking issues, but if there are any other possibilities to look at, I'd like to look into those.

    Thanks!

    Wednesday, September 16, 2015 3:38 PM
  • In common HW configuration and standard SQL, HPC scheduler can dispatch 200 tasks per second, thus, if you have 2500 idle cores, the 2500-task job's tasks can start running within 30 seconds. Any specific configuration for your job?

    - Is there any task dependencies in your job?

    - Does it have large amount of environment variables?



    Qiufang Shi

    Thursday, September 17, 2015 5:27 AM
  • the error means it cannot find any online network adapters with IPCv4 address, can you use "ipconfig /all" to take a look whether the network adapter configure correct on that HN VM
    Thursday, September 17, 2015 5:59 AM
  • The networking is mostly working fine, it's just showing the old physical network adapter.  It seems now that the machine is a virtual, it is not properly recognizing its virtual adapters as valid NICs (even though they are functioning).  Is there a way reset the network config so it forces you to run the wizard or reconfigure the head node in general?

    Thursday, September 17, 2015 6:01 PM
  • All of our nodes are workstation nodes and we assign tasks per node, not per core. When the job has a large number of tasks it takes over 15 minutes for each task to get assigned. If the job has 100 tasks, it will assign them all right away and things seem to work fine.  There are no obvious bottlenecks on the server with proc/memory/disk, but it definitely is related to the number of tasks in the job.  What troubleshooting tool or logs does HPC have that I can look at to try and profile the problem the box is experiencing?
    Thursday, September 17, 2015 6:04 PM
  • the log is under %CCP_HOME%Data\LogFiles, for job related log, you can see the log under Scheduler, for network management log, you can see the log under Management,

    the log are the files with postfix bin, you can run the following command to parse it,

    hpctrace parselog <logfilename>

    then it will generate one log file, you can open it with any text editor/viewer.

    for the network issue, can you open HpcClusterManager, select "Node management" from the left wunder bar, then select "Operations", can you take a look whether there are failed operation, especially related to "Update the configuration......", if have, please paste the log here

    Friday, September 18, 2015 12:57 AM