achieve immediate and graceful preemption in the same cluster RRS feed

  • Question

  • How do I use graceful preemption in general and still have one type of jobs immediately preempt other jobs on a subset of the cluster?

    To illustrate with two job types and two node groups:

    Job type A can only run one node group 1 while type B could utilize both node group 1 and 2. Now I want task level graceful preemption in general but a job from group A should immediately preempt group B's tasks on nodes in node group A.

    The SDK code example with dynamic node groups kind of achieve that but how should I trigger increase and decrease of node group A (which events should I listen for to trigger an increase or a decrease in the number of nodes in group A)?

    Or should I run it at a regular interval as a scheduled task?

    Or job template level activation filter for type B tasks to handle the increase of nodes in group B and job template level activation filter for type A tasks to handle the decrease of nodes in group B (the latter should work but I am not sure if the first would ever be activated?) ?

    Friday, September 4, 2015 4:13 PM

All replies

  • Hi,

      Within your Node Group A and Node Group A, are these nodes identical? If yes, you can check whether resource pool can help your scenario: https://technet.microsoft.com/en-us/library/hh859715.aspx . This way you can remove the node group (A and B)

      But even with Resource pool, you are not able to have two preemption policy in the system. From your description, you can accomplish this by canceling and re-queueing job type B (Put Job type B in lower priority) when you find there are queued type A jobs --- You can put this logic in job template level submission filter:

    1. When job type A is submitted, check whether there are enough idle cores for this job

    2. If yes, submit it

    3. If no, check whether there are running job type B jobs, if yes: modify all queued job type B jobs with "HoldUntil current time + 30 seconds", cancel the running job type B, re-queque it to current time + 10 seconds

    4. Submit it

    Of course, you can have your own scheduled tasks to move nodes between node group A and node group B, that shall also work. Put this logic in activation filter may not work for your scenario.

    Qiufang Shi

    Sunday, September 6, 2015 2:49 AM