Solved: Windows HPC 2012 R2: Limit Core Resources RRS feed

  • Question

  • For a specific Software we need to limit the resources (number of cores) to a certain number.
    Is there a way to achieve that in Windows HPC 2012?

    BUT: It should only limit the cores needed for a certain software/template. Another software/template should be able to use those resources no matter what.

    I know this could be achieved with an activation filter (but we can't make one, because we can't).

    Here's the Problem in short:
    Software A has 27 licenses. Software B has 8 licenses. We have 36 cores available.
    Too many free resources will fail queued software A jobs.

    Resource pools only guarantee a minimum resources, not a maximum.

    Any clues or ideas?!

    • Edited by Gilbert_F Friday, January 6, 2017 8:19 AM Solved
    Monday, December 19, 2016 11:26 AM

All replies

  • Looks like this is a great fit for activation filter. Without using activation filter, you might limit your software A/B only running in resource Node GroupA/GroupB and assign only 27 resource in GroupA and 8 in GroupB. And when job submitted, just associate the group resource so that the job won't exceed the resource count from the group

    Qiufang Shi

    Tuesday, December 20, 2016 7:55 AM
  • Qiufang Shi, thanks for your answer.

    Uhm ... how do I do this: "assign only 27 resource in GroupA" ?

    I forgot to mention that we have three servers, each with 12 cores (36 cores total).
    Server 1 is in node-group A+B and server 2+3 are in node-group A.
    So, server 1 can use both software where server 2+3 are exclusive for software B.

    I tried to make an activation filter but failed miserably! Somehow, something is not working the way it should - and I downloaded the sample package from the HPC 2012 SDK...

    Tuesday, December 20, 2016 10:25 AM
  • Is your cluster only for stofware A and B? And do the software has specific CPU affinity when it is running? If not, you can look into the node subscription (To set the core/socket number as you want). Say:

    Server 1 + Server 2: total 27 cores, only run for Software A

    Server 3: Total 8 cores, only run for Software B

    What's your license server? Our sample is based on FlexLM. The sample should work. And introduction to the activation filter is available here: https://technet.microsoft.com/en-us/library/ff919469(v=ws.11).aspx

    you need know how to query the available license and then modify the sample a little bit. Hold the job if there is not enough license available.

    How it is failed in your case?

    Qiufang Shi

    Wednesday, December 21, 2016 1:09 AM
  • Thanks for trying to help! I try to help, to help me, as good as possible!

    Again: 3 Servers, 12 cores each.
    Software A has a max. of 27 licenses (this can vary between 1 and 27 per job). Software B has a max. of 8 licenses (again: between 1 and 8 per job).
    Software A can run on all three servers and software B must run on server 1.

    So, I cannot "cheat" the nodes in having less CPUs than physically available.

    Activation Filter:
    I know the query and it works. I know that for sure.
    As long as its a console application, the output (return value) is correct. But as soon as I change it to a class library, I can no longer debug and Windows HPC tells me that the simulation (job) was stopped by the activation filter. I also cannot write a log (to a file) with a class library.... I'm not a programmer, I'm an engineer! ;-)

    Thursday, December 22, 2016 7:29 AM
  • Example:

    Jobs 690 abd 692 are running and using 14 cores.

    Job ID    State    Owner              Priority    Requested Resources        Pending Reason
    690        Running    domain\me   Normal        8-8 Cores       
    692        Running    domain\me   Normal        6-6 Cores       
    693        Queued    domain\me   AboveNormal    4-4 Cores                The activation filter prevented this job from starting.   
    694        Queued    domain\me   Lowest        8-8 Cores                Higher priority jobs take precedence.

    (Sorry for the text, can't add images to post. Even after verification)

    Now, jobs 693 and 694 should be running as well (backfilling activated and run time specified)! This queue only uses 26 of 27 cores/licenses.
    For the jobs 690 and 692, I was NOT using the activation filter (specified in the job template). The filter was used for the other two jobs.

    EDIT: Oh, and the log file ... no entry!

    • Edited by Gilbert_F Thursday, December 22, 2016 1:45 PM
    Thursday, December 22, 2016 1:43 PM
  • Semi-Solved.

    Using an Activation Filter from TotalCAE and in job templates added "node ordering".

    Friday, January 6, 2017 8:19 AM