none
Job priority vs. required resources

    Question

  • Hello,

    We have an HPC Server 2008 R2 SP1 cluster and experience problems with the scheduler. There are some jobs with Normal priority in the queue that all are scheduled to run on specific nodes (like nodes 1-5). So these nodes are fully occupied. And there are some other jobs with Lowest priority that are scheduled to run on every node (like nodes 1-10). But, by some reason they are not running but rather just sit in the queue, so the nodes 6-10 are Idle. I think that happened because of their priority, so that HPC Server sees that he has jobs with Normal priority in the queue and stops all Lowest jobs from running despite the fact that they are actually scheduled to run on all nodes and can be started on idle nodes!

    To make things clear, the simplified problem is the following. If you have 2 jobs in queue, Job1 with Normal priority scheduled to run on Node1 and Job2 with Lowest priority scheduled to run on Node1, Node2, the second job will not run before the first job, even if Node2 is idle!

    I already seen such behavior once upon a time when we had an HPC Server 2008 OS. I don't know, if it is the bug or feature, I tried to play with scheduler parameters (switching from queued to balanced schedule mode) but without success. Can you advice how solve this problem to make Lowest priority jobs run on idle nodes even if there are Normal priority jobs in queue.

    Friday, February 11, 2011 5:28 PM

All replies

  • Hi Nikita,

    We'd need to get more information about the configuration of the cluster and possibly a job history to help out.  For this issue, I would suggest you open a case with the HPC team and we can assist further.  The initial information we'd need to collect can be obtained through the HPC powershell applets.

    Cluster configuration:

    Get-HpcClusterProperty

    Job History:

    Get-HpcJobHistory (you can use additional details to populate the date range or it will prompt for this)

    Kevin

    Wednesday, February 16, 2011 6:28 PM
  • Looks like you and scheduler have different understanding of "Job2 scheduled to run on Node1, Node2". Scheduler thinks Job2 requires both nodes, while you think it can start from 1 node.

    You might want to look at Grow and Shrink feature in http://technet.microsoft.com/en-us/library/dd197402(WS.10).aspx, and submit Job2 with expected maximum and minimum resource requirement.

    • Proposed as answer by Zhen WEI MSFT Saturday, March 12, 2011 3:53 AM
    • Unproposed as answer by Nikita Tropin Monday, March 14, 2011 3:11 AM
    Saturday, March 12, 2011 3:52 AM
  • I choose min and max for job and tasks both to 1 core and check the nodes on Resource Selection tab. Besides that, this problem doesn't exist when all jobs have the same priority and all scheduled to run on Node1, Node2. They don't wait till all checked nodes become free but start on the first available core one by one.
    Monday, March 14, 2011 3:11 AM
  • let me forward your question to Windows HPC Server Job Submission and Scheduling
    Monday, March 14, 2011 6:44 AM