locked
Restricting number of jobs of one type that can be run on a node while allowing others RRS feed

  • Question

  • Hi,

    Is there a way to tell the cluster manager that only one instance of a job of type "A" be allowed to run on a particular node, while as many instances of another job of type "B" are allowed? I have tried the following solutions:

    1) Setting the allocation unit of job A to "Node" and the allocation unit of other jobs to "Core". This restricts the number of instances of job type A running on a node to one, but does not let other jobs run. 
    2) Setting the "Exclusive" option to true also does the same thing as above.

    Right now I am looking at a work around using license check-out activation filter. So what I am thinking of doing is to restrict the licenses available for a software to 1 on each node and then somehow make HPC manager check for licenses before submitting a job. Right now I am not sure whether this will work in my case or not.

    Are there any other option to get this to work?

    Thanks!
    Prashant 
    Tuesday, December 8, 2009 5:29 PM

Answers

  • Hi Prashant,

    Assuming you can't actually fix your application so that you can run multiple instances of it on the same node, unfortunately there aren't too many methods available for doing what you want.

    There is only one scheduling queue, so the only ways to prevent two already-queued JobA's from being scheduled on the same node are to (i) use the Exclusive job option for each JobA; (ii) specify a resource of 1-node for each JobA; (iii) cancel one of the queued JobA's to prevent it from being scheduled; or (iv) block the entire queue until the second queued JobA can be scheduled. Options (i) and (ii) are the recommended methods but, as you said, they prevent any other jobs from being scheduled on those nodes at the same time. Options (iii) and (iv) can be achieved using submission/activation filters, but you end up with canceled jobs or a blocked queue.

    Conceivably you could emulate a second queue by writing some standalone code or script using the Job Scheduler APIs or HPC Powershell to manage your submission of JobA's so that you do not even submit a new JobA unless there is an available node. If there is an available node and there are no JobAs queued up waiting for that node, your code/script can specify that available node as a requested resource for the new instance of JobA and submits it.

    Regards,

    Patrick

    Wednesday, December 9, 2009 12:38 AM

All replies

  • Hi Prashant,

    I think using a Job Activation Filter is a good solution. They are very useful for managing the scheduling of license-constrained jobs.

    However, it's not clear whether licensing is the original reason why you wish to restrict instances of job A running on a node, or are you introducing licensing as a way to implement a workaround?

    Regards,

    Patrick
    Tuesday, December 8, 2009 7:33 PM
  • Thanks for a reply Patrick!!

    The reason why I want to restrict job A to only one instance is that the software that I am running as a part of the job has some problem due to which we need to restrict the the number of concurrent jobs to 1 for a node.

    Actually I looked into the option of using Activation filter with some licensing scheme as a workaround (at this point I am not sure if I will be using this workaround), but it would be better if we had some flag in the Job template/Job saying that for the jobs submitted to this queue just allow 1 instance per node, while allowing other type of jobs on that node.

    Regards,
    Prashant



    Tuesday, December 8, 2009 8:16 PM
  • Hi Prashant,

    Assuming you can't actually fix your application so that you can run multiple instances of it on the same node, unfortunately there aren't too many methods available for doing what you want.

    There is only one scheduling queue, so the only ways to prevent two already-queued JobA's from being scheduled on the same node are to (i) use the Exclusive job option for each JobA; (ii) specify a resource of 1-node for each JobA; (iii) cancel one of the queued JobA's to prevent it from being scheduled; or (iv) block the entire queue until the second queued JobA can be scheduled. Options (i) and (ii) are the recommended methods but, as you said, they prevent any other jobs from being scheduled on those nodes at the same time. Options (iii) and (iv) can be achieved using submission/activation filters, but you end up with canceled jobs or a blocked queue.

    Conceivably you could emulate a second queue by writing some standalone code or script using the Job Scheduler APIs or HPC Powershell to manage your submission of JobA's so that you do not even submit a new JobA unless there is an available node. If there is an available node and there are no JobAs queued up waiting for that node, your code/script can specify that available node as a requested resource for the new instance of JobA and submits it.

    Regards,

    Patrick

    Wednesday, December 9, 2009 12:38 AM