none
Several Tasks dependency

    Question

  • I have a batch file that will launch the jobs ;  something like that :

    for /f ....  do (
    job add %%i /name:"JobOne"

    job add %%i /name:"JobTwo"

    job add %%i /name:"JobThree"

    job add %%i /name:"JobFoure" /depend:"JobThree" ...

    job add %%i /name:"JobFive" /depend:"JobThree" ...

    job add %%i /name:"JobSix" /depend:"JobThree" ...

    job add %%i /name:"JobSven" /depend:"JobThree" ...

    job submit /sched......

    )

    in this case ; job 4 /5 /6/7 may not be executed until job 3 finish  right ?
    in fact this is not what happened ,

    when job 3 finish , job 4 /5 /6/7  are still waiting for job 1 / 2 to finish !

    how can I do ?

    Wednesday, February 26, 2014 12:06 PM

All replies

  • What command did you use for the test?

    If the tasks ran for a very short period, for example, 10 seconds, the behavior might look strange.

    Please use several long-running tasks to test again.

    HPC Scheduler calculates and allocates resources (cores/sockets/nodes) when a job is started. In your case, it might be 3.

    When a batch of tasks become runnable, Scheduler needs to allocate more resources for them.

    This re-allocation happens at a long interval of more than 10 seconds, which may be the reason why you did not observe the behavior you wanted.

    Thursday, February 27, 2014 9:52 AM
  • In fact every task takes more than 10 minutes , Knowing that there is enough free cores
    if you need any other informations I will give it to you
    • Edited by Smida.A Thursday, February 27, 2014 10:49 AM
    Thursday, February 27, 2014 10:16 AM
  • What's the version of you HPC system?

    Any special settings? Such as min core number.

    Friday, February 28, 2014 1:29 AM
  • HPC 2008 R2

    gridMaster : 1
    gridSlaves  : 5
    Allocated core in every Grid : 4
    jobfile :

    <IsBackfill="false" NextTaskNiceID="1" HasGrown="false" HasShrunk="false" OrderBy="" RequestCancel="None" RequeueCount="0" AutoRequeueCount="0" FailureReason="None" PendingReason="None" AutoCalculateMax="true" AutoCalculateMin="true" ParentJobId="0" ChildJobId="0" NumberOfCalls="0" NumberOfOutstandingCalls="0" CallDuration="0" CallsPerSecond="0" ProjectId="1" JobTemplateId="1" OwnerId="3" ClientSourceId="3" Project="" JobTemplate="Default" DefaultTaskGroupId="15" xmlns="http://schemas.microsoft.com/HPCS2008/scheduler/">

    Friday, February 28, 2014 10:10 AM
  • What's the version number?

    Where did you get these data?

    The XML seems different from my exported job XML in 2008 R2.

    Monday, March 03, 2014 2:35 AM
  • version : 3.3.3950.0

    the data above is just a part of the config file  will call it when execute the job
    here is the exported xml job :
    <?xml version="1.0" encoding="utf-8"?>
    <Task Version="3.000" Id="153068" ParentJobId="8606" State="Finished" UnitType="Core" WorkDirectory="path" NiceId="1" CommandLine="MyProg.exe" RequeueCount="0" PendingReason="None" StartValue="0" EndValue="0" IncrementValue="1" GroupId="56232" SubmitTime="date time" StartTime="date time" CreateTime="date time" Name="tsakname" MinCores="1" MaxCores="2" AutoRequeueCount="0" Type="Basic" FailJobOnFailure="false" xmlns="http://schemas.microsoft.com/HPCS2008R2/scheduler/" />

    • Edited by Smida.A Wednesday, March 12, 2014 9:59 AM
    Monday, March 03, 2014 10:50 AM
  • can any one help me ?
    Friday, March 07, 2014 10:29 AM
  • Hi Smida,

    Is that possible the job resources for cores were not enough to run task 4/5/6/7 after 3 finished while 1/2 was running? Noticed that the job resources for cores were set as AutoCalculateMax/Min, which means job scheduler would only count the cores required by task 1/2/3. If the number of cores taken by task 3 was less than the min core requirement for task 4/5/6/7, then they need to wait for task 1/2 to finish and free more resources.

    If that is the case, the work around is just to specify the min/max resources of the job to allow possible concurrent tasks.

    Regards,

    Yutong

    Sunday, March 16, 2014 3:43 PM
  • Hello Yutong,

    in fact there are 12 cores , every task takes 1 cores
    so if the task 3 finish ; task 4/5/6/7 must run without waiting for the 1 and 2 witch is not the case

    Monday, March 17, 2014 10:34 AM
  • Hi Smida,

    Sorry for my "disappearance". I set an alert on this thread but no notification was received when you replied my posts...

    I noticed that you used a job config file, right?

    To isolate the issue you met, let us do a simple test without job config file:

    C:\>job new
    Created job, ID: 1473

    C:\>job add 1473 /name:"One" ping 127.0.0.1 -n 300
    Task 1473.1 added.

    C:\>job add 1473 /name:"Two" ping 127.0.0.1 -n 300
    Task 1473.2 added.

    C:\>job add 1473 /name:"Three" ping 127.0.0.1 -n 60
    Task 1473.3 added.

    C:\>job add 1473 /name:"Four" /depend:"Three" ping 127.0.0.1 -n 180
    Task 1473.4 added.

    C:\>job add 1473 /name:"Five" /depend:"Three" ping 127.0.0.1 -n 180
    Task 1473.5 added.

    C:\>job add 1473 /name:"Six" /depend:"Three" ping 127.0.0.1 -n 180
    Task 1473.6 added.

    C:\>job add 1473 /name:"Seven" /depend:"Three" ping 127.0.0.1 -n 180
    Task 1473.7 added.

    C:\>job submit /id:1473
    Job has been submitted. ID: 1473.

    I tried this in my own environment and task "Four" started immediately after "Three" finished.

    Several seconds later, when more cores were allocated, "Five", "Six" and "Seven" started to run.

    In Cluster Manager - Options - Job Scheduler Configuration - Policy Configuration, my Scheduling mode is "Queued" and both Increase/Decrease resource automatically are checked.

    Please try this simple test to see whether the root cause is the job config file you use.

    Wednesday, March 19, 2014 4:42 AM
  • Hello SnOoPy,

    Look I checked the Cluster Manager - Options - Job Scheduler Configuration - Policy Configuration, and I found the same config as you

    then I tried the simple above , and the tasks Four/Five/Six/Seven are still waiting for the Tasks One And two even when the Task Three finish .

    another remark , if I write
    job submit /id:1473   --> it returns this message : "No Connection could be made because the target machine actively refused it ::1:5800"

    I must add " /scheduler:gridwrks0" to make it work
    I have 6 nodes and the gridwrks0 is the head-node (every node has 4 cores)

    --> So it's clear that the problem is not in the config file I'm using :D
    is there any other options I must check ?.



    • Edited by Smida.A Wednesday, March 19, 2014 1:34 PM
    Wednesday, March 19, 2014 1:26 PM
  • Hello Smida,

    I've built a brand new HPC V3 SP3 3950 cluster and did repro what you suffered.

    It looked like a bug in V3 SP3, so I tried to find an official fix for it.

    Fortunately, I found KB2690584 at http://www.microsoft.com/en-us/download/details.aspx?id=29253.

    You might think the description of this KB indicates something not related with your problem, but it does work.

    The root cause must be in the Task Validation process.

    Again, V3 SP3 is an old version, please upgrade to SP4 and install all the KBs for SP4 if possible.



    • Edited by SnOoPy1214 Thursday, March 20, 2014 5:02 AM
    Thursday, March 20, 2014 5:01 AM
  • Thank you a lot SnOoPy ,

    In fact we have a Win Server 2008 R2 Standard -  SP1  not an SP3


    Monday, March 24, 2014 8:52 AM
  • When I said SP I meant HPC SP.

    3.3.3950.0, this is HPC V3 SP3 with no KB installed, and the bug you suffered had already been fixed in SP4.

    You may search MS download center for HPC Pack 2008 R2 Service Pack 4 and all the KBs for it.


    • Edited by SnOoPy1214 Monday, March 24, 2014 8:57 AM
    Monday, March 24, 2014 8:57 AM
  • Got it ;

    I will let you know when it's done ,

    thank you

    Monday, March 24, 2014 9:55 AM
  • Hello again  SnOoPy1214  ,
    I have 2 questions if you don't mind :
     1- is there any Risk of instability of the system after Upgrading the HPC , you know , is kinda critical and we must be sure that we  would not have a problems

     2- on this version we cannot make dependency between jobs , will we take advantage of this option with the SP4 ?

    and thank you again
    Wednesday, March 26, 2014 9:28 AM
  • HPC 2008 R2 does not support job dependencies, no matter which Service Pack is installed. It's a new feature in HPC 2012.

    There should be no risk to upgrade HPC, but if you have any concern, it's ok to stay with SP3, since the bug can be fixed by the KB.

    Thursday, March 27, 2014 1:33 AM
  • many thanks
    Thursday, March 27, 2014 10:40 AM