none
Incorrect Core Count In HPC

    Question

  • I just added 30 new Proliant BL460c G9 with two Octo Core E5-2640 2.60GHZ CPU's. The servers show 16 cores, but HPC Cluster Manager shows 8 Cores and only runs 8 cores when a job is submitted. How can this be resolved to reflect and use the number of Cores on the servers?
    Thursday, March 3, 2016 4:49 PM

Answers

  • We found a solution.

    In the BIOS, we changed NUMA Group Size Optimization from Cliustered to Flat. .....Problem solved

    Tuesday, March 8, 2016 8:41 PM

All replies

  • Hi, what is the version of your HPC cluster, (you can open HpcClusterManager, click on menu 'Help', choose 'About', it will show detail version)

    And in HpcClusterManager, choose 'Resource Management' (or 'Node Management' in version before HPC 2012 R2 Update 3), see what is the core number for that node, (you can right click on one column header, select 'column chooser', and add cores, 'Subscribed Cores' to display)

    BTW, whether your compute node has configured as NUMA?


    Friday, March 4, 2016 2:22 AM
  • I tried version 3.3.3950.0 and version 4.2.4400.0 with the same result. It still only shows 8 cores instead of 16. When I log into that server, it displays 16 cores, so I know the system sees the 16 cores. It is HPC that is only seeing and using 8 cores. 'Subscribed Cores' does not display anything and that includes the other nodes we have in this environment that do not have this same issue. Our older machines with two E5-2680 Octo core CPU's show all 16 cores in HPC. So it is just the new Gen 9 servers that seem to have this issue. 
    Friday, March 4, 2016 1:13 PM
  • What is your OS version for the compute node? it is windows 2012 or else?

    can you check how many processor groups of the node, whether the node is NUMA node?

    Monday, March 7, 2016 3:02 AM
  • The OS version for the compute node is Windows 2008 R2. Cluster Manager shows two groups and they are not NUMA nodes.

    Node State Node Health   Cores 

     Offline                 OK         8 

    Processors

    {Name="Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz", MaxClockSpeed="2600", L2Cache="2048"},{Name="Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz", MaxClockSpeed="2600", L2Cache="2048"}

    Monday, March 7, 2016 2:35 PM
  • Suggest you upgrade OS to Windows 2012 or Windows 2012 R2, and using HPC 2012 R2 Update 2,

    as the following reasons:

    1, in OS before Windows 2012, we still use Environment.ProcessorCount to get core count, it will only get one processor groups cores, as in OS before 2012, cannot utilize the cores in other processor groups,

    from windows 2012, we use new API to get cores from all processor groups

    2, in HPC 2012 R2 Update 3, we fix one issue in HpcScheduler service, to show correct cores number during job submit

    Any more issues, please let us know.

    Tuesday, March 8, 2016 2:15 AM
  • I already tried moving the nodes to a Windows 2012 R2 OS with HPC Version version 4.2.4400.0 with the same results. The environment we are having this issue is production and cannot be upgraded any time soon
    Tuesday, March 8, 2016 2:55 PM
  • it is strange, if HPC version 4.2.440 on Windows 2012 R2 OS, from HpcClusterManager node view, it should show correct core number.

    Can you try that, delete that compute node from HPC cluster, then the node will automatically join HPC cluster, then you can assign node template for that node, then check the core number?


    BTW, can you check whether has the member with empty name (in HpcClusterManager, select Configuration from left navigation panel, then select 'Users', it will show all users, see whether user with SID as name)
    Tuesday, March 8, 2016 3:28 PM
  • I tried to delete it many times and apply different templates with same result, only showing 8 cores. All of the users show up fine (No SID's). The other 100+ servers in the cluster all show the correct number of cores.
    Tuesday, March 8, 2016 3:42 PM
  • Ok,  so some other server with same hardware configuration also show correct core number? if that, I have no idea now.

    if possible, please send HpcManagement service log on that node to me, my email is yongtia@microsoft.com

    the log is under %CCP_HOME%Data\LogFiles\Management, the log file is HpcManagement*.bin

    Tuesday, March 8, 2016 4:03 PM
  • The other servers, while having 16 cores, have a different chipset from these new servers. I will send you the HpcManagement service log to your email
    Tuesday, March 8, 2016 4:14 PM
  • That file does not exist on the Head Node. Will file HpcManagement.etl help?
    Tuesday, March 8, 2016 4:20 PM
  • We found a solution.

    In the BIOS, we changed NUMA Group Size Optimization from Cliustered to Flat. .....Problem solved

    Tuesday, March 8, 2016 8:41 PM