locked
HPC 2008 R2 Cluster: Limit cores-per-node for a job RRS feed

  • Question

  • We are running an HPC 2008 R2 cluster with 48-core compute nodes.  One of our jobs is running into IO errors when it runs on so many cores on the same node, so we would like to be able to limit the number of cores used on a given node.

    Is there any way to specify, on a per-job basis or even globally, that only a subset of the cores on a given node should be used, and the rest should be ignored?

    (I have seen the previous question, http://social.microsoft.com/Forums/en/windowshpcsched/thread/59dd7fc4-78f0-401e-9a47-867a78b5ecf1, which seems to say this is not possible.  However, that question covers an older version of Windows HPC, so I'm hoping this functionality has been added.)

    Wednesday, January 12, 2011 7:58 PM

All replies

  • Hi. I have been interested in this as well. You might find this other thread interesting:

     

    http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/d85478de-1784-4abe-bdae-f5acea9cb8ae

     

    Derek

    Wednesday, January 12, 2011 11:17 PM
  • This is still true, there is no direct way in the current releases of the HPC scheduler to either over or under subscribe the cores on a node.

    There are 2 job and task properties which might allow you to do what you want.

    The Exclusive proprtty on a job/task will cause the task to be the only thing running on a node. So, regardless of how many cores, sockets etc the task requires or are on the node the task will be the only thing the job scheduler will execute on a node.

    The mimimum cores option can also be set on a task. This will reserve at least this many cores of a task. So, for instance, if there are 4 cores/node and the min cores on a task is 2, there wil be a maximum of 3 tasks running on the node. The task with the minimum cores set to 2 as well as potentially 2 other tasks that defaulted the min task or set it to 1. If all the tasks in the cluster have the min cores set to 2 then there would never be more the 2 tasks per node ( assuming 4 cores/node).

    There is one other methoid that is quite Draconian. There is a mechanism to tell the OS that the number of cores on a node is lees than the actual. Then, after rebooting, the OS believes there are less cores than there actually are & therefore HPC will schedule based on this reduced number.

    Friday, January 14, 2011 5:25 PM
  • Are you using job and task interchangeably? I have an SOA application (that sends a 1000 requests each with a 10000 sims) and I'd like to limit the number of cores that I employ on each of my workstation nodes. I can obviously create a job template and set maximum number of cores but I'd lke to divvy up that maximum number among all my workstation nodes.

    For example,  I have 4 workstations each with 8 cores, is there a way to use just 4 cores on each node?

    The way I have it now is to use a Job Template with a maximum core count of 16. But, when I hit the service with the 1000 requests, only 2 machines (with all of their cores) are being employed. Ideally, I'd like to use just 4 on each. Is there a way to do this(without going the 'Draconian' route described in the post above)?

    (Edit: I also specified minimum number of nodes and maximum number of cores on the job template in hopes that the Head node would make a fairly equitable usage of cores across each node but that is not the case.)

    Thanks.



    • Edited by kaykaly Tuesday, March 27, 2012 10:07 PM
    • Proposed as answer by Greg Keller Friday, April 13, 2012 5:03 AM
    • Unproposed as answer by Greg Keller Friday, April 13, 2012 5:04 AM
    Tuesday, March 27, 2012 9:55 PM
  • In HPC Powershell:

    So for 4 total cores on the node, assuming it has 2 sockets....

    Set-HpcNode -name:node001 -SubscribedCores:4 -SubscribedSockets:2

    The number of cores has to be a multiple of the number of sockets.  You can under or oversubscribe, YMMV.  We use this routinely to avoid having to turn hyperthreading on/off for different job types.  Nodes must be offline to make the change.  "Subscribed Cores" is a column in the column chooser now.

    Or for a loop to get a whole list of nodes...
    PS C:\Windows\system32> for ($i=257;$i -le 272; $i++)
    >> { Set-HpcNode -name:node$i -SubscribedCores:4 -SubscribedSockets:1 }

    see http://technet.microsoft.com/en-us/library/ff950183.aspx

    I think this is added in HPC R3 SP2

    Cheers!


    • Proposed as answer by Greg Keller Friday, April 13, 2012 5:20 AM
    • Edited by Greg Keller Friday, April 13, 2012 5:30 AM
    Friday, April 13, 2012 5:20 AM