How to avoid resource starvation for low priority jobs? RRS feed

  • Question

  • Here's the scenario: I have low priority jobs in a queue behind some higher priority jobs, and the nodes are all busy. More high priority jobs are submitted and queued, before the low priority jobs can run. This means that the low priority jobs never run. 

    Is there any way to avoid this? Or would I have to implement something myself? E.g. as a Job's age increases, slowly bump up its priority.

    I'm using HPC Pack 2012, with scheduling mode "queued", and graceful preemption.
    • Edited by TimJRoberts1 Thursday, September 25, 2014 3:08 PM added version and settings
    Thursday, September 25, 2014 3:07 PM

All replies

  • Yes, in "Queued" mode, it always runs job from high priority to low priority, so if you want low priority to run, please use "Balanced" mode, it will try to start all jobs with min-resource require, and then if there is still free resource, it will allocate them to high priority job, but at least low prioity job can start with min-cores.
    Friday, September 26, 2014 5:19 AM
  • I tried in balanced mode with Priority Bias: High/Medium/None but can reproduce the same problem.
    In my test I'm submitting to a cluster with 2 compute nodes (4 cores and 1 socket on each).
    I submit a job with 2 tasks, that require 4 cores each. Each task simply sleeps for 5 seconds.
    I then submit a low priority job with 1 task that requires 1 core.
    Since there is already a queue when the low priority job is submitted, and the submission rate is faster than the completion rate then this low priority task sits in the queue forever.

    • Edited by TimJRoberts1 Friday, September 26, 2014 3:04 PM added priority balance modes
    Friday, September 26, 2014 2:36 PM
  • In your test, there are total 2*4=8 cores, and the high priority job need 8, so, there is no more cores for low priority job.

    here is a sample for balanced mpde:

    1. for example, there are 10 cores in the cluster

    2. job1 - high priority, needs: 8-auto, that means, the job need 8 cores to run at least, can grow if there are still free core

    3. job2- Normal priority, needs: 1-auto

    4. job3 - low priority, needs: 1-auto

    5. job4 - low priority, needs: 1-auto

    Then, the result is job1, job2, and job3 will run, but job4 can't run before it will meet the min core for jobs from high to low(job1, then job2, then job3)

    if you want low priority to run, please reduce high priority job min requirement.

    Sunday, September 28, 2014 6:07 AM
  • OK thanks for the reply. I cannot reduce the high priority job min cores requirement, as this task has a high RAM requirement. Running multiple on the same node would cause problems.

    I have implemented my own "aging", by slightly increasing the priority of jobs every X seconds, so that the low priority jobs eventually get high enough priority to run.

    Monday, September 29, 2014 10:36 AM