locked
Multiple jobs bot starting until previous job is completely done RRS feed

  • Question

  •  

    I have submitted 2 jobs, each one consisting of several hundred invidial tasks, each task is running on 1 of 32 processors at a time.  The problem is that I have < 32 tasks remaining on the first job, and hence there are available processors for the second job to start running some tasks, however, the second job is not starting -- it is pending due to "Not enough available processors".

     

     

    I submitted both jobs with a template, for each task, i set IsExclusive="false" , MaximumNumberOfProcessors="1" MinimumNumberOfProcessors="1"

     

     

    Is there a way to make the 2nd job start without waiting for every task to finish for the first job?

     

    ( sample .xml from job template: )

     

    <?xml version="1.0" encoding="utf-8"?>
    <Job xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" SoftwareLicense="" AskedNodes="BT2,BT3,BT4,BT5,BT6,BT7,BT8,BT9" MaximumNumberOfProcessors="32" MinimumNumberOfProcessors="1" Runtime="Infinite" IsExclusive="false" Priority="Highest" Name="job1" Project="Proj" RunUntilCanceled="false">
    <Tasks xmlns="http://www.microsoft.com/ComputeCluster/">
    <Task MaximumNumberOfProcessors="1" MinimumNumberOfProcessors="1" Depend="" WorkDirectory="\\BT1\runOpt\20080124"  Stdout=".\200704\launchOpt_2007_04_1.out" Stderr=".\200704\f_1.err" Name="task1" CommandLine="job1 1"  IsExclusive="false" IsRerunnable="true" Runtime="Infinite">
    <EnvironmentVariables />
    </Task>
    • Moved by Josh BarnardModerator Thursday, March 26, 2009 12:27 AM (Moved from Windows HPC Server Developers - General to Windows HPC Server Job Submission and Scheduling)
    Wednesday, January 30, 2008 3:43 PM

Answers

  • Hi,

     

    We’ve heard feedback about this kind of “long tail” problem before. Another case is if you were doing a large parametric sweep with 10K tasks across 64 nodes. Imagine that all the tasks except for a couple take only a second to complete. The final two tasks take a few hours. All 64 nodes will be tied up until those last two tasks complete.

     

    Windows HPC Server 2008 (our next release) will include a Grow/Shrink job scheduling policy. When enabled this policy would allow us to shrink the number of nodes allocated to the job. In the case above, the number of jobs allocated would shrink to two nodes.

     

    You can download the beta of Windows HPC Server 2008 from http://connect.microsoft.com.

     

    Ryan Waite

    Saturday, February 2, 2008 11:09 PM