none
Issue in job chaining

    Question

  • Hello,

    We are using job chaining feature of HPC in our current project, and currently stuck at one point. In our case, we have a single job, master-job, which is dependent on 200 jobs. Basically, this master-job has to be executed once all 200 job finish execution. We are using ParentJobId property of this job to add the 200 parent jobs Ids. Unfortunately, our master-job Id never gets executed, and we keep getting the below message:

    "the job is pending: Some parent jobs of this job are canceled or failed".

    So, it looks like all the parent jobs have to be successfully finished, but in our case few jobs fail. We wonder how can we achieve desired chaining behavior where child job executes once all parent jobs finish irrespective of their completion statutes (finished, canceled, or failed). Please help us out.

     


    Puneet Sharma

    Thursday, May 10, 2018 12:28 AM

Answers

  • Hi Sharma,

      the system is designed to run child job when all parent job finished successfully, this allows user to fix the issue and requeue the failed job.

      A quick workaround for you is to always complete the job in "finished state" by:

    - In your job, add one task that depend on all your existing tasks in the job, and this task is responsible to set the job’s state to “Finished” instead of failed if there is false alarming failed tasks

    - the task commandline: job finish %CCP_JOBID% /message:”finish job from task”

      but then you need to check the task status in the job to determine whether it is failed or not.

      Or you could make a request on this new feature, and we could investigate at our side.


    Qiufang Shi

    Friday, May 11, 2018 3:06 AM

All replies

  • Hi Sharma,

      the system is designed to run child job when all parent job finished successfully, this allows user to fix the issue and requeue the failed job.

      A quick workaround for you is to always complete the job in "finished state" by:

    - In your job, add one task that depend on all your existing tasks in the job, and this task is responsible to set the job’s state to “Finished” instead of failed if there is false alarming failed tasks

    - the task commandline: job finish %CCP_JOBID% /message:”finish job from task”

      but then you need to check the task status in the job to determine whether it is failed or not.

      Or you could make a request on this new feature, and we could investigate at our side.


    Qiufang Shi

    Friday, May 11, 2018 3:06 AM
  • Dear Quifang,

    The workaround will help us as it was one of our testing tools which was using this chaining. However, for project level integration we should have fork-join kind of behaviours for the job level and also for the task level. I will send request of this new feature on the support group and provide more details. 

    Thanks for helping us out. I really appreciate it.


    Puneet Sharma

    Friday, May 11, 2018 5:28 PM