none
How to achieve a highest priority group which takes precedence over 3 other hierarchical groups

    Question

  • Dear all,

    I'm trying to achieve the following behavior in HPC 2012, but I haven't found a solution yet.

    We are submitting jobs based on sockets.

    High level Summary of what I want to achieve:
    Have 3 groups with hierarchical priority, but have a 4<sup>th</sup> group which gets all resources regardless of what's running.

    Detailed explanation:

    1. We have a hierarchy of 4 groups (Group0, GroupA, GroupB and GroupC) for submitting calculations.
    2. Group A takes precedence over B, which takes precedence over C.
    3. A, B and C can start in parallel, but due to the hierarchy each group will start with minimum resources until the higher prioritized jobs finish. Then the freed-up resources from the higher prioritized jobs are allocated to the next lower prioritized jobs.
    4. If higher priority jobs start while lower priority jobs are running, the lower priority jobs will be reduced to their minimums and the higher priority jobs will start and grow to run as fast as possible.
    5. Group 0 is a special highest priority group. Jobs started from this group should always get all available resources, regardless of what's running. Jobs from group A, B and C should be stopped and rescheduled to run later when the Group 0 jobs have finished.

     I tried several settings and configurations, but it seems that I cannot achieve point 5.

    To achieve point 1-4 I did the following:

    -  Balanced mode

    - Immediate preemption

    - High Bias

    - I defined 4 templates with
                  Group 0=Highest priority
                  Group A=AboveNormal priority
                  Group B=Normal priority
                  Group C=BelowNormal priority
                  No minimums or maximums defined at all for sockets and cores.

    The settings above cover points 1-4. Group A, B and C behave as expected. Group 0 however also behaves like A, B and C which is not what we want.

    For covering point 5 I thought I'll try with resource pools. I defined 2 resource pools.
    - Default pool has a weight=0
    - MyPrioPool has a weight=100

    And I allocated the job template Group 0 to the MyPrioPool and the templates of Group A, B and C to the default pool.

    According to the HPC documentation, a weight of 0 has no guaranteed cores, but can have allocated cores if there are jobs that are submitted to the pool, and the other pools are not using all of their resources. Hence When group 0 jobs are not present, all cores are available and Group A, B and C should get all resources of the cluster (according to their priority, etc.). But if a Group 0 job is submitted, it should take 100% of the cluster and hence all other jobs should be rescheduled.

    In this configuration, group 0 jobs run fine.

    However, A, B and C jobs are not started at all. They are queued with the message " Not enough available resources". To make these jobs start. I need to modify the weight of the default pool. If the weight is 0, the number of guaranteed cores is also 0 and jobs are queued. If I increase the weight of the default pool slowly, the B, C and D jobs start as soon as the number of guaranteed cores equals 1 socket. And then they start and take all available resources if nothing else is running, which is OK.

    But this means that I cannot prevent A, B and C jobs from "stealing" 1 socket to Group 0 jobs, because the resource pool guarantees them at least 1 socket. So this solution will not fulfill point 5.

     My questions are:

    1. Why are jobs not started if the weight of the default pool is <1 socket?
    2. In the first place, I want to achieve somehow the scenario above, so I would appreciate very much if someone has other ideas or suggestions how to implement point 5 in any possible way.

    Any feedback, thoughts and suggestions are very much appreciated.

    Thank you very much in advance and best regards

    Carlos

    Thursday, June 29, 2017 9:29 AM

All replies

  • Hi Carlos,

    What's the version of HPC Pack 2012 are you using? Is it HPC Pack 2012 R2 Update 3 (4.5.5079.0) with the latest June QFE (4.5.5161.0)?

    In old versions of HPC Pack 2012, we have known issues around resource pool, especially when specified together with node group or requested nodes (do your jobs have these properties specified?).

    For Group 0 jobs, is it acceptable to pause/resume (use holduntil) all other active jobs within the job?

    Regards,

    Yutong Sun

    Monday, July 03, 2017 1:49 AM