none
Run task in all compute nodes of the cluster RRS feed

  • Question

  • Hello,

    I am using HPC 2012.
    I am looking for a way to run the first task of a job in all compute nodes.
    My goal is to use it to install/update some resources needed by the job.
    Looking at the documentation, using a NodePrep task type may do what I want.

    I have some questions however.
     - NodePrep task will not run at the beginning on all nodes but only on the nodes allocated to the job, right ?
     - If the NodePrep task fails on a node, it will not be used. Can I configure a retry on the NodePrep task ?

    Cheers,
    Candide


    Tuesday, September 27, 2016 10:08 AM

All replies

  • If it is an on premise cluster and all compute nodes are local machine, I would recommend to do install and update with below method:

    1. For one time installation, either put the installation step in the node template or do it through Clusrun

    2. Or if the application setup is hard to automate, try capture the image and deploy from your own image

    For per job data, update some resource needed by the data, NodePrep task is the right place to do, this task will always be run when a resource is allocated to the job, so that it can do work like "net use" an SMB share and copy the needed data.

    And for your questions:

    1. Yes, the NodePrep task will only be run on nodes allocated to the job

    2. NodePrep/NodeRelease task will not be retried if the task failed (Which is different from normal task). You shall handle the retry in your own script.


    Qiufang Shi

    Wednesday, September 28, 2016 2:56 AM