I found a strange behavior while assigning jobs to particular nodes through JobTemplate
Here is our setting.
1 Head Node, Windows HPC server 2008 R2 Service pack1
10 Calculation Nodes, Windows HPC server 2008 R2 Service pack1
Let’s say name of Calculation Nodes starts with Calc like Calc1, Calc2, Calc3...
1st Test
We create a directory “D:\Home” on a Calc7 Node only and placed “Worker.exe”
1.
Select Calc7 Node, click right button and then select a new NodeGroup and name it “GroupA”
2.
Create a New Job Template “CalcTemplate”
3.
Add Node Groups property to the template “CalcTemplate”
4.
And select “GroupA” for both default values and required values in Node Groups property
5.
Create 300 jobs and add 5 tasks for each job
6.
Specify “CalcTemplate” for all jobs
7.
Submit jobs to Scheduler
Result:
Most of tasks successfully executed but normally 2 ~ 3 jobs
failed because they were assigned to Calc1 Node which don’t have
“D:\Home” folder and “Worker.exe”.
Even
though We clarify Calc7 Node only for “CalcTemplate”
I also have looked at Resource Selection in a view pane of failed tasks that shows Selected node groups “GroupA”
But Allocated Nodes of the pane shows Calc1 Node. It must be something wrong there in HPC
2nd Test
All procedures are same but at this test we apply Requested Nodes property in a JobTemplate instead Node Group Property
1.
Add Requested Nodes property to the template “CalcTemplate”
2.
And explicitly select “Calc7” Node for both default values and required values in that property
3.
Create 300 jobs and add 5 tasks for each job
4.
Specify “CalcTemplate” for all jobs
5.
Submit jobs to Scheduler
Result: same result as 1st Test. Most of tasks went to well except two or three tasks in 300 jobs
3rd Test
Apply both Node Groups and Requested Nodes properties to JobTemplate “CalcTemplate”
and specify “Calc7” Node only.
Result: It seem to assign node correctly now. We tested this with more than 20,000 jobs
I believe it should not happen to assign wrong Node which none related to specified Job Template
either on 1st, 2nd Test.