none
Can number of jobs submitted to a node exceed the number of cores on the node? RRS feed

  • Question

  • Hi,

    I submit jobs to the cluster manager by setting the "Job template", so that the job gets scheduled according to the template. As per my understanding and observation, the maximum number of jobs that can be executed on a node concurrently does not exceed the number of cores present on the machine. Is there a way to bypass this limitation of the maximum number of jobs being less than or equal to the number of cores?

    Thanks!
    Prashant  
    Monday, December 14, 2009 7:44 AM

Answers

  • Hi Prashs,

    The restriction is not on the number of jobs running on a node. The restriction is that the number of *tasks* running on a node will not exceed the number of cores on that node, because a "core" is the smallest unit of resource that can be requested.

    However, there is one case where this restriction can be bypassed: Remote Commands. For example, if a node has, say, 8 cores, then you may run more than 8 instances of clusrun on that node. Additionally, the clusrun tasks will not use up any cluster cores at all, and other jobs may still be scheduled on all 8 cores while the clusrun tasks are running. The HPC Job Scheduler API may also be used to create remote commands that will bypass this restriction. However, the principal limitations are (i) only cluster admins may use clusrun or create Remote Commands; and (ii) these type of commands are not subject to any scheduling policies.

    Rergards,

    Patrick

    Monday, December 14, 2009 6:42 PM

All replies

  • Hi Prashs,

    It is by design that number of jobs running on a node cannot exceed number of cores on the node.

    Can you provide more details on what problems you want to resolve by doing this?

    thanks,

    liwei
    Monday, December 14, 2009 6:30 PM
  • Hi Prashs,

    The restriction is not on the number of jobs running on a node. The restriction is that the number of *tasks* running on a node will not exceed the number of cores on that node, because a "core" is the smallest unit of resource that can be requested.

    However, there is one case where this restriction can be bypassed: Remote Commands. For example, if a node has, say, 8 cores, then you may run more than 8 instances of clusrun on that node. Additionally, the clusrun tasks will not use up any cluster cores at all, and other jobs may still be scheduled on all 8 cores while the clusrun tasks are running. The HPC Job Scheduler API may also be used to create remote commands that will bypass this restriction. However, the principal limitations are (i) only cluster admins may use clusrun or create Remote Commands; and (ii) these type of commands are not subject to any scheduling policies.

    Rergards,

    Patrick

    Monday, December 14, 2009 6:42 PM
  • @liwei
    I was just curious because there will be times when the tasks won't be using the CPU (like if they are waiting for some IO and things like that). So running more tasks than the total number of cores might be more efficient than restricting it to exactly the number of cores.

    Thanks!
    Prashant
    Monday, December 14, 2009 9:10 PM