locked
Fail to assign a job to particular nodes through JobTemplate RRS feed

  • Question

  • I found a strange behavior while assigning jobs to particular nodes through JobTemplate

    Here is our setting.

    1 Head Node, Windows HPC server 2008 R2 Service pack1

    10 Calculation Nodes, Windows HPC server 2008 R2 Service pack1

    Let’s say name of Calculation Nodes starts with Calc like Calc1, Calc2, Calc3...

    1st Test

    We create a directory “D:\Home” on a Calc7 Node only and placed “Worker.exe”

    1.      Select Calc7 Node, click right button and then select a new NodeGroup and name it “GroupA”

    2.      Create a New Job Template “CalcTemplate”

    3.      Add Node Groups property to the template “CalcTemplate”

    4.      And select “GroupA” for both default values and required values in Node Groups property

    5.      Create 300 jobs and add 5 tasks for each job

    6.      Specify “CalcTemplate” for all jobs

    7.      Submit jobs to Scheduler

    Result:  Most of tasks successfully executed but normally 2 ~ 3  jobs  failed because they were assigned to Calc1 Node which don’t have “D:\Home” folder and “Worker.exe”.

     Even though We clarify Calc7 Node only for “CalcTemplate”

    I also have looked at Resource Selection in a view pane of failed tasks that shows Selected node groups “GroupA”

    But Allocated Nodes of the pane shows Calc1 Node. It must be something wrong there in HPC

    2nd Test

    All procedures are same but at this test we apply Requested Nodes property in a JobTemplate instead Node Group Property

    1.      Add Requested Nodes property to the template “CalcTemplate”

    2.      And explicitly select “Calc7” Node for both default values and required values in that property

    3.      Create 300 jobs and add 5 tasks for each job

    4.      Specify “CalcTemplate” for all jobs

    5.      Submit jobs to Scheduler

    Result: same result as 1st Test. Most of tasks went to well except two or three tasks in 300 jobs

    3rd Test

    Apply both Node Groups and Requested Nodes properties to JobTemplate “CalcTemplate”

    and specify “Calc7” Node only.

    Result: It seem to assign node correctly now. We tested this with more than 20,000 jobs

     

    I believe it should not happen to assign wrong Node which none related to specified Job Template

    either on 1st, 2nd Test.

    Friday, May 20, 2011 8:19 AM