locked
Requeue Task to use a different set of nodes if possible RRS feed

  • Question

  • Hello,

    I am using HPC 2012.
    I would like to know if there is a way when I requeue tasks to ask the hpc scheduler to rerun it on a different node than the one that ran it previously. My goal is to rerun tasks that ran but returned an error code on a different node to mitigate problems related to sporadic node failures.

    I normally use ISchedulerJob.RequeueTask api to requeue tasks but it does not have any option to do what I describe.

    Thanks

    Wednesday, May 17, 2017 9:30 AM

Answers

  • Hi

      I'm afraid we don't have specific command to achieve task level node exclusion. What you can accomplish:

    1. Through job excluded nodes: "job modify <jobid> /addexcludednodes:nodename", this way your requeued tasks won't run on the nodes that failed your tasks. If you find that quite a few tasks failed on somenodes, you can exclude that node from the running job.

    2. Through node prep task: usually tasks failed on some nodes due to some pre-requesit won't ready on the nodes, thus you can provide pre-requesit task to ensure that the node is ready for your job. Node prep task will run on nodes that first allocated to the job. And the node will be excluded from the job if node prep task failed.


    Qiufang Shi

    • Marked as answer by cguevaramari Monday, May 22, 2017 8:12 AM
    Monday, May 22, 2017 3:55 AM