Answered by:
New nodes not fully utilizing all its cores and not "finishing" properly

Question
-
We currently have a Compute Cluster 2003 with 12 nodes + head, running 4 cores blades with 6 GB RAM for our developers use. We've recently acquired 10 new nodes 12 core blades 24 GB RAM. All of them running Srv2003 x64 R2. We took an image of one of the current nodes (Altiris - syspreped - latest windows updates) and deployed them to the new nodes. They automatically joined the cluster without a problem so our developers began testing with a couple of them. The problems we're facing are:
- Even though, dozens of jobs are submitted, only 8 jobs (one per core) are assigned to the new node when it has 12 cores (compute cluster admin correctly sees all 12 cpus).
- Jobs succesfully complete but their status never changes to "completed" and we end up having to kill them manually.
Any pointers would be very much appreciatedTuesday, January 12, 2010 5:22 PM
Answers
-
Hi,
I got more details for this issue:
1) I was wrong that heterogeneous clusters were not supported.
2) In heterogeneous cluster, the CCS 2003 scheduler will try to schedule jobs on all the nodes for load-balacing, if you didn't specifically tell which node to run the job. So before the 4-core nodes are fully used, it won't fully use the 12-core nodes.
3) Suggestion: can you run more jobs on your cluster so that the 4-core nodes can be fully used?
The following is a simple example:
> job submit /exclusive:false ping -t localhost
>node list
NODE_NAME STATUS MAX RUN IDLE
R25-3399F1001-3 READY 4 1 3
R25-3399FHN01-3 READY 2 0 2> job submit /exclusive:false ping -t localhost
>node list
NODE_NAME STATUS MAX RUN IDLE
R25-3399F1001-3 READY 4 2 2
R25-3399FHN01-3 READY 2 0 2> job submit /exclusive:false ping -t localhost
>node list
NODE_NAME STATUS MAX RUN IDLE
R25-3399F1001-3 READY 4 2 2
R25-3399FHN01-3 READY 2 1 1> job submit /exclusive:false ping -t localhost
>node list
NODE_NAME STATUS MAX RUN IDLE
R25-3399F1001-3 READY 4 3 1
R25-3399FHN01-3 READY 2 1 1> job submit /exclusive:false ping -t localhost
>node list
NODE_NAME STATUS MAX RUN IDLE
R25-3399F1001-3 READY 4 3 1
R25-3399FHN01-3 READY 2 2 0> job submit /exclusive:false ping -t localhost
>node list
NODE_NAME STATUS MAX RUN IDLE
R25-3399F1001-3 READY 4 4 0
R25-3399FHN01-3 READY 2 2 0- Marked as answer by Don PatteeModerator Thursday, May 6, 2010 11:41 PM
Wednesday, January 20, 2010 7:33 PM
All replies
-
Can you do a "node list" from a command window & send the result?
In V1 we don't support heterogeneous clusters. In V2 we do. I'm not too surpised that things aren't working as expected with 4 cores/node on the old hardware & 12 cores/node on the new hardware.Tuesday, January 12, 2010 7:27 PM -
NODE_NAME STATUS MAX RUN IDLE
0026558242E7-11 PENDING 12 0 12
DAS13 READY 4 4 0
DAS14 READY 4 2 2
DAS15 READY 4 0 4
DAS16 READY 4 0 4
DAS17 READY 4 0 4
DAS18 READY 4 2 2
DAS19 READY 4 3 1
DAS20 READY 4 4 0
DAS21 READY 4 3 1
DAS22 READY 4 4 0
DAS23 READY 4 4 0
DAS24 READY 4 4 0
DAS26 PAUSED 12 8 4
DAS27 PAUSED 12 3 9
DAS28 PENDING 12 0 12
DAS29 PENDING 12 0 12
DAS30 PAUSED 12 0 12
DAS31 PENDING 12 0 12
DAS32 PENDING 12 0 12
DAS13 through DAS24 were the original cluster, DAS26 through DAS32 are the new systems. DAS26 and DAS27 are the ones we're testing with.Wednesday, January 13, 2010 2:38 PM -
Just to give you guys an update:
Yesterday morning the whole compute cluster was acting funky for the original nodes. New jobs wouldn’t submit and the few ones running at that time wouldn’t cancel. We’ve resolved this by restarting the whole cluster, including the two new 12 cores we’re testing with.After the compute cluster restart, the new nodes (12 cores each) are now taking 4 jobs at a time (1 core per job) and are changing to a finished status when completed, freeing up resources properly.
In summary: The new nodes are only taking 4 jobs at a time and not fully utilizing its 12 cores but are now freeing up resources when jobs complete.
Thursday, January 14, 2010 2:45 PM -
Hi,
As Steve mentioned, In V1 (Compute Cluster 2003), heterogeneous clusters (some nodes have more cores than others) were not supported. The scheduler may behave in unexpected ways.
I recommend that you use V2 (HPC Server 2008), where heterogeneous clusters are supported.
If you are not ready to go to V2 yet, to improve your issue, you may have to split the cluster into 2 clusters. so that each cluster is homogenous.
thanks,
LiweiThursday, January 14, 2010 7:44 PM -
Thanks Liwei, do you know by any chance of Msft articles talking about the compute cluster V1 and unsuppoted hetereogeneous environment? Appreciate the helpThursday, January 14, 2010 8:16 PM
-
Hi,
I got more details for this issue:
1) I was wrong that heterogeneous clusters were not supported.
2) In heterogeneous cluster, the CCS 2003 scheduler will try to schedule jobs on all the nodes for load-balacing, if you didn't specifically tell which node to run the job. So before the 4-core nodes are fully used, it won't fully use the 12-core nodes.
3) Suggestion: can you run more jobs on your cluster so that the 4-core nodes can be fully used?
The following is a simple example:
> job submit /exclusive:false ping -t localhost
>node list
NODE_NAME STATUS MAX RUN IDLE
R25-3399F1001-3 READY 4 1 3
R25-3399FHN01-3 READY 2 0 2> job submit /exclusive:false ping -t localhost
>node list
NODE_NAME STATUS MAX RUN IDLE
R25-3399F1001-3 READY 4 2 2
R25-3399FHN01-3 READY 2 0 2> job submit /exclusive:false ping -t localhost
>node list
NODE_NAME STATUS MAX RUN IDLE
R25-3399F1001-3 READY 4 2 2
R25-3399FHN01-3 READY 2 1 1> job submit /exclusive:false ping -t localhost
>node list
NODE_NAME STATUS MAX RUN IDLE
R25-3399F1001-3 READY 4 3 1
R25-3399FHN01-3 READY 2 1 1> job submit /exclusive:false ping -t localhost
>node list
NODE_NAME STATUS MAX RUN IDLE
R25-3399F1001-3 READY 4 3 1
R25-3399FHN01-3 READY 2 2 0> job submit /exclusive:false ping -t localhost
>node list
NODE_NAME STATUS MAX RUN IDLE
R25-3399F1001-3 READY 4 4 0
R25-3399FHN01-3 READY 2 2 0- Marked as answer by Don PatteeModerator Thursday, May 6, 2010 11:41 PM
Wednesday, January 20, 2010 7:33 PM -
Liwei,
I've been using the "asked nodes" flag during my testing as I want to make sure the job would use one of the new hex core nodes. Regardless, I've just followed your test in our current environment with the same results:
- Submitted 12 jobs using the asked node as follow: "job submit /exclusive:false /askednodes:das29 ping -t localhost"
- Only 4 jobs were assigned and the rest stayed in the queue (one of them failed thus there are only 11 jobs listed under "job list")
- The queued jobs get assigned to DAS29 right after canceling any of the 4 running jobs
NOTE: I've removed 4 of the 7 new systems from the cluster for other testing, thus DAS29, DAS31, and DAS32 are showing under node list
C:\Temp>job list
ID SubmittedBy Name Status Priority
749930 *********** ***********:Jan 26 2010 2:52P Running Normal
749931 *********** ***********:Jan 26 2010 2:52P Running Normal
749932 *********** ***********:Jan 26 2010 2:52P Running Normal
749933 *********** ***********:Jan 26 2010 2:52P Running Normal
749935 *********** ***********:Jan 26 2010 2:52P Queued Normal
749937 *********** ***********:Jan 26 2010 2:52P Queued Normal
749938 *********** ***********:Jan 26 2010 2:52P Queued Normal
749939 *********** ***********:Jan 26 2010 2:53P Queued Normal
749940 *********** ***********:Jan 26 2010 2:53P Queued Normal
749941 *********** ***********:Jan 26 2010 2:53P Queued Normal
749936 *********** ***********:Jan 26 2010 2:52P Queued Normal
C:\Temp>node list
NODE_NAME STATUS MAX RUN IDLE
DAS13 READY 4 0 4
DAS14 READY 4 0 4
DAS15 READY 4 4 0
DAS16 READY 4 0 4
DAS17 READY 4 2 2
DAS18 READY 4 2 2
DAS19 READY 4 0 4
DAS20 READY 4 1 3
DAS21 READY 4 1 3
DAS22 READY 4 2 2
DAS23 READY 4 0 4
DAS24 READY 4 0 4
DAS29 READY 12 4 8
DAS31 PAUSED 12 0 12
DAS32 PAUSED 12 0 12Tuesday, January 26, 2010 9:05 PM