locked
User balancing and dependencies RRS feed

  • Question

  • Does anyone have any advice about the following?

    I'm trying to use Microsoft compute cluster in a shared environment, where we basically only care that the
    cluster is kept busy 100% of the time, and that resources are shared fairly between the users that happen
    to be using the cluster. For this to happen, it looks like we need to solve a couple of problems.

    1. How do you get the cluster scheduler to automatically prioritize jobs according to users current share of the
        running jobs? This seems to be difficult. One brute force solution might be to use the activation filter,
        but my guess is that this would get very slow when there are tens or hundreds of thousands of jobs in
        the queue. ( Also see 2. this would interact with job dependency emulation, since if all of a users jobs
        are waiting on dependencies, then we want other users to jump ahead of him/her even if that is "unfair").

        Another solution might be to create a scheduled task that looks at the cluster every few seconds, and
        increases priority setting on jobs for users with fewer jobs running.

        It feels like this would only work well with dependencies if we also set up dependencies at the job level (see 2).
        If we were using task dependencies then a typical job might have 5,000 tasks, with the width of the dependency
        tree varying over time. One user might use up the whole cluster, with one or more jobs, but then we would want
        the cluster to free up processors for other users if they went on the cluster. So we would try to emulate
        job dependencies (2 give another reason for this).


    2. How to use dependencies without causing idle cpu resources? We have 4 processors per node on our system, and
        we haven't been able to use task dependencies without massive waste of cluster resources. For example, suppose,
        I have a job with 5 tasks. The first 4 tasks are independent, but the 5th task generates a report from the results of
        the first 4 tasks, and hence depends on those tasks. What seems to happen is that the scheduler reserves 4
        processors or the 4 tasks, but then won't release them for tasks from other jobs once the 5th task is running.
        This seems to be because resources are allocated at the job level not the task level. We do have backfilling turned on,
        and we are submitting everything as non-exclusive.

        Basically this means we don't use task dependencies, because they waste too many resources.

        It seems like we can emulate job dependencies with the activation filter, with the dependencies passed in the
        extended job terms.  But it would be nice to know if there is a better way.

    Thanks for listening everyone.

    Any ideas/advice?
    Saturday, May 31, 2008 3:50 PM

Answers

  •  

    You say you "only care that the cluster is kept busy 100% of the time, and that resources are shared fairly between the users."  I wouldn't include the "only;" what you describe is the ultimate goal that all job schedulers aspire to!  :-)

     

    I'll try to answer your questions below; be aware that two of them have been addressed in version 2 which is currently in beta.   You can check out the beta at http://connect.microsoft.com

     

    Q: "How do you get the cluster scheduler to automatically prioritize jobs according to users current share of the running jobs?" 

    A: We don't currently have a built-in mechanism to do this, but it's something that we are looking at providing in a future version (v3 or v4 of the HPC pack scheduler).  Your suggestions of using either Activation Filters or a monitoring task are both good ones.

     

     

    Q: "If we were using task dependencies then a typical job might have 5,000 tasks, with the width of the dependency tree varying over time. One user might use up the whole cluster, with one or more jobs, but then we would want the cluster to free up processors for other users if they went on the cluster."

    A: Have you taken a look at the Graceful Pre-emption scheduling policy (in Verison 2)?  It will allow higher priority jobs to force shrink lower priority jobs, and may help solve the problem that you are describing.

     

    Q: How to use dependencies without causing idle cpu resources?

    A: This is addressed by the Grow/Shrink scheduling policy in Version 2.

     

    Thanks,
    Josh

    Friday, June 6, 2008 11:33 PM
    Moderator
  • One goal of our scheduler is to always schedule a job in such a way that it can finish . . . this means we try to allocate a job enough resources to run all of it's tasks in parallel even if it can't use them at the moment.  So I'm afraid their may not be a good way to accomplish what you are looking to do.

    We're already looking at some changes to Activation Filters to support what you are asking for, and we are also looking at possibly adding Job Dependencies.  Unfortunately neither of these would be available before v3.


    Your application sounds interesting . . . can you provide some more information on the jobs that you are submitting?  How many tasks are in a job?  What is the flow?  What does your application do?


    Thanks and sorry for the bad news,
    Josh


    -Josh
    Monday, June 23, 2008 11:22 PM
    Moderator

All replies

  •  

    You say you "only care that the cluster is kept busy 100% of the time, and that resources are shared fairly between the users."  I wouldn't include the "only;" what you describe is the ultimate goal that all job schedulers aspire to!  :-)

     

    I'll try to answer your questions below; be aware that two of them have been addressed in version 2 which is currently in beta.   You can check out the beta at http://connect.microsoft.com

     

    Q: "How do you get the cluster scheduler to automatically prioritize jobs according to users current share of the running jobs?" 

    A: We don't currently have a built-in mechanism to do this, but it's something that we are looking at providing in a future version (v3 or v4 of the HPC pack scheduler).  Your suggestions of using either Activation Filters or a monitoring task are both good ones.

     

     

    Q: "If we were using task dependencies then a typical job might have 5,000 tasks, with the width of the dependency tree varying over time. One user might use up the whole cluster, with one or more jobs, but then we would want the cluster to free up processors for other users if they went on the cluster."

    A: Have you taken a look at the Graceful Pre-emption scheduling policy (in Verison 2)?  It will allow higher priority jobs to force shrink lower priority jobs, and may help solve the problem that you are describing.

     

    Q: How to use dependencies without causing idle cpu resources?

    A: This is addressed by the Grow/Shrink scheduling policy in Version 2.

     

    Thanks,
    Josh

    Friday, June 6, 2008 11:33 PM
    Moderator
  •  

     

    Is there any way to let a lower or equal priority job, whose tasks are waiting on dependencies, be pre-empted by a task of a later job, which isn't being held up?

     

    I wrote an activation filter, to try to allow dependencies between jobs. Basically the dependency job ids are put in the extended terms of the job, and the filter checks whether those jobs have finished.

     

    The problem I am having with that, it that the scheduler is not allowing later jobs to run in front of jobs that are waiting on the activation filter. So I can't use this mechanism without holding most of the cluster idle. Is there some cluster setting that lets jobs run in front of other jobs that are waiting on an activation filter?

     

    One thing I should have added about dependencies, is that there may be many "trees" of dependent jobs submitted to the cluster at the same time. As far as dependencies go, I basically want the "first"  job that is free to run, to be allowed to run on any free processor. Any scheme where I can have processors held idle while they wait for something to finish, while another job could be running, stalls the cluster for us --- because we have both large dependency trees and also large groups of smaller dependency trees. (large = 10000, small = 50)

     

    Aside: The jobs scheduling policy we are trying to get the cluster to follow is this

     

    1. When a processor becomes free, the cluster will run the "first" available job that is not held

        up by a dependency.

    2. The order which determines "first" is that higher priority jobs are run before low priority jobs, and otherwise

        the jobs submitted earlier are run before jobs submitted later. [ For user load balancing, jobs are ordered, by

        priority, then by how many processors the user is currently using, then submittion order.]

     

    For us, this policy is considered to be a solution to the problem of sharing resources fairly, and at the same

    time keeping the cluster 100% busy. [ I understand that different applications call for different notions of "fair",

    and a different optimal policy, but for us this policy represents our needs. ]

     

    Any ideas how we could get something like this going on our window's cluster? We implemented something

    similar for running jobs on multiprocessor machines, using a priority queue to schedule the jobs that were free

    to run and a dependency notification mechanism that efficiently added jobs to the queue as soon as a group of

    dependencies was completed. On the cluster, it seems like it would be hard to circumvent the scheduling policy

    it has without all sorts of race conditions. What we are really looking for is some way to make the cluster run

    a job/task whenever there is a processor free and a job that is able to run without violating a dependency.

     

     

     

     

     

     

     

    Thursday, June 12, 2008 3:09 PM
  • One goal of our scheduler is to always schedule a job in such a way that it can finish . . . this means we try to allocate a job enough resources to run all of it's tasks in parallel even if it can't use them at the moment.  So I'm afraid their may not be a good way to accomplish what you are looking to do.

    We're already looking at some changes to Activation Filters to support what you are asking for, and we are also looking at possibly adding Job Dependencies.  Unfortunately neither of these would be available before v3.


    Your application sounds interesting . . . can you provide some more information on the jobs that you are submitting?  How many tasks are in a job?  What is the flow?  What does your application do?


    Thanks and sorry for the bad news,
    Josh


    -Josh
    Monday, June 23, 2008 11:22 PM
    Moderator