locked
New nodes not fully utilizing all its cores and not "finishing" properly RRS feed

  • Question

  • We currently have a Compute Cluster 2003 with 12 nodes + head, running 4 cores blades with 6 GB RAM for our developers use.  We've recently acquired 10 new nodes 12 core blades 24 GB RAM. All of them running Srv2003 x64 R2.  We  took an image of one of the current nodes (Altiris - syspreped - latest windows updates) and deployed them to the new nodes.  They automatically joined the cluster without a problem so our developers began testing with a couple of them.  The problems we're facing are:

       - Even though, dozens of jobs are submitted, only 8 jobs (one per core) are assigned to the new node when it has 12 cores (compute cluster admin correctly sees all 12 cpus).

       - Jobs succesfully complete but their status never changes to "completed" and we end up having to kill them manually.

    Any pointers would be very much appreciated
    Tuesday, January 12, 2010 5:22 PM

Answers

  • Hi,

    I got more details for this issue:
    1) I was wrong that heterogeneous clusters were not supported.
    2) In heterogeneous cluster, the CCS 2003 scheduler will try to schedule jobs on all the nodes for load-balacing, if you didn't specifically tell which node to run the job. So before the 4-core nodes are fully used, it won't fully use the 12-core nodes. 

    3) Suggestion: can you run more jobs on your cluster so that the 4-core nodes can be fully used?

    The following is a simple example:

    > job submit /exclusive:false ping -t localhost
    >node list
    NODE_NAME           STATUS        MAX   RUN   IDLE
    R25-3399F1001-3     READY         4     1     3
    R25-3399FHN01-3     READY         2     0     2

    > job submit /exclusive:false ping -t localhost
    >node list
    NODE_NAME           STATUS        MAX   RUN   IDLE
    R25-3399F1001-3     READY         4     2     2
    R25-3399FHN01-3     READY         2     0     2

    > job submit /exclusive:false ping -t localhost
    >node list
    NODE_NAME           STATUS        MAX   RUN   IDLE
    R25-3399F1001-3     READY         4     2     2
    R25-3399FHN01-3     READY         2     1     1

    > job submit /exclusive:false ping -t localhost
    >node list
    NODE_NAME           STATUS        MAX   RUN   IDLE
    R25-3399F1001-3     READY         4     3     1
    R25-3399FHN01-3     READY         2     1     1

    > job submit /exclusive:false ping -t localhost
    >node list
    NODE_NAME           STATUS        MAX   RUN   IDLE
    R25-3399F1001-3     READY         4     3     1
    R25-3399FHN01-3     READY         2     2     0

    > job submit /exclusive:false ping -t localhost
    >node list
    NODE_NAME           STATUS        MAX   RUN   IDLE
    R25-3399F1001-3     READY         4     4     0
    R25-3399FHN01-3     READY         2     2     0



    Wednesday, January 20, 2010 7:33 PM

All replies

  • Can you do a "node list" from a command window & send the result?

    In V1 we don't support heterogeneous clusters. In V2 we do. I'm not too surpised that things aren't working as expected with 4 cores/node on the old hardware & 12 cores/node on the new hardware.

    Tuesday, January 12, 2010 7:27 PM
  • NODE_NAME         STATUS        MAX   RUN   IDLE
    0026558242E7-11   PENDING       12    0     12
    DAS13             READY         4     4     0
    DAS14             READY         4     2     2
    DAS15             READY         4     0     4
    DAS16             READY         4     0     4
    DAS17             READY         4     0     4
    DAS18             READY         4     2     2
    DAS19             READY         4     3     1
    DAS20             READY         4     4     0
    DAS21             READY         4     3     1
    DAS22             READY         4     4     0
    DAS23             READY         4     4     0
    DAS24             READY         4     4     0
    DAS26             PAUSED        12    8     4
    DAS27             PAUSED        12    3     9
    DAS28             PENDING       12    0     12
    DAS29             PENDING       12    0     12
    DAS30             PAUSED        12    0     12
    DAS31             PENDING       12    0     12
    DAS32             PENDING       12    0     12


    DAS13 through DAS24 were the original cluster, DAS26 through DAS32 are the new systems.  DAS26 and DAS27 are the ones we're testing with.
    Wednesday, January 13, 2010 2:38 PM
  • Just to give you guys an update:

    Yesterday morning the whole compute cluster was acting funky for the original nodes. New jobs wouldn’t submit and the few ones running at that time wouldn’t cancel. We’ve resolved this by restarting the whole cluster, including the two new 12 cores we’re testing with.

     

    After the compute cluster restart, the new nodes (12 cores each) are now taking 4 jobs at a time (1 core per job) and are changing to a finished status when completed, freeing up resources properly. 

     

    In summary:  The new nodes are only taking 4 jobs at a time and not fully utilizing its 12 cores but are now freeing up resources when jobs complete.

    Thursday, January 14, 2010 2:45 PM
  • Hi,

    As Steve mentioned, In V1 (Compute Cluster 2003), heterogeneous clusters (some nodes have more cores than others) were not supported. The scheduler may behave in unexpected ways.

    I recommend that you use V2 (HPC Server 2008), where heterogeneous clusters are supported.

    If you are not ready to go to V2 yet, to improve your issue, you may have to split the cluster into 2 clusters. so that each cluster is homogenous.

    thanks,

    Liwei

    Thursday, January 14, 2010 7:44 PM
  • Thanks Liwei, do you know by any chance of Msft articles talking about the compute cluster V1 and unsuppoted hetereogeneous environment?  Appreciate the help
    Thursday, January 14, 2010 8:16 PM
  • Hi,

    I got more details for this issue:
    1) I was wrong that heterogeneous clusters were not supported.
    2) In heterogeneous cluster, the CCS 2003 scheduler will try to schedule jobs on all the nodes for load-balacing, if you didn't specifically tell which node to run the job. So before the 4-core nodes are fully used, it won't fully use the 12-core nodes. 

    3) Suggestion: can you run more jobs on your cluster so that the 4-core nodes can be fully used?

    The following is a simple example:

    > job submit /exclusive:false ping -t localhost
    >node list
    NODE_NAME           STATUS        MAX   RUN   IDLE
    R25-3399F1001-3     READY         4     1     3
    R25-3399FHN01-3     READY         2     0     2

    > job submit /exclusive:false ping -t localhost
    >node list
    NODE_NAME           STATUS        MAX   RUN   IDLE
    R25-3399F1001-3     READY         4     2     2
    R25-3399FHN01-3     READY         2     0     2

    > job submit /exclusive:false ping -t localhost
    >node list
    NODE_NAME           STATUS        MAX   RUN   IDLE
    R25-3399F1001-3     READY         4     2     2
    R25-3399FHN01-3     READY         2     1     1

    > job submit /exclusive:false ping -t localhost
    >node list
    NODE_NAME           STATUS        MAX   RUN   IDLE
    R25-3399F1001-3     READY         4     3     1
    R25-3399FHN01-3     READY         2     1     1

    > job submit /exclusive:false ping -t localhost
    >node list
    NODE_NAME           STATUS        MAX   RUN   IDLE
    R25-3399F1001-3     READY         4     3     1
    R25-3399FHN01-3     READY         2     2     0

    > job submit /exclusive:false ping -t localhost
    >node list
    NODE_NAME           STATUS        MAX   RUN   IDLE
    R25-3399F1001-3     READY         4     4     0
    R25-3399FHN01-3     READY         2     2     0



    Wednesday, January 20, 2010 7:33 PM
  • Liwei,

    I've been using the "asked nodes" flag during my testing as I want to make sure the job would use one of the new hex core nodes.  Regardless, I've just followed your test in our current environment with the same results:

       - Submitted 12 jobs using the asked node as follow:  "job submit /exclusive:false /askednodes:das29 ping -t localhost"
       - Only 4 jobs were assigned and the rest stayed in the queue (one of them failed thus there are only 11 jobs listed under "job list")
       - The queued jobs get assigned to DAS29 right after canceling any of the 4 running jobs

    NOTE:  I've removed 4 of the 7 new systems from the cluster for other testing, thus DAS29, DAS31, and DAS32 are showing under node list


    C:\Temp>job list
    ID         SubmittedBy      Name                               Status       Priority
    749930     ***********        ***********:Jan 26 2010  2:52P Running      Normal
    749931     ***********        ***********:Jan 26 2010  2:52P Running      Normal
    749932     ***********        ***********:Jan 26 2010  2:52P Running      Normal
    749933     ***********        ***********:Jan 26 2010  2:52P Running      Normal
    749935     ***********        ***********:Jan 26 2010  2:52P Queued       Normal
    749937     ***********        ***********:Jan 26 2010  2:52P Queued       Normal
    749938     ***********        ***********:Jan 26 2010  2:52P Queued       Normal
    749939     ***********        ***********:Jan 26 2010  2:53P Queued       Normal
    749940     ***********        ***********:Jan 26 2010  2:53P Queued       Normal
    749941     ***********        ***********:Jan 26 2010  2:53P Queued       Normal
    749936     ***********        ***********:Jan 26 2010  2:52P Queued       Normal
    C:\Temp>node list
    NODE_NAME           STATUS        MAX   RUN   IDLE
    DAS13             READY         4     0     4
    DAS14             READY         4     0     4
    DAS15             READY         4     4     0
    DAS16             READY         4     0     4
    DAS17             READY         4     2     2
    DAS18             READY         4     2     2
    DAS19             READY         4     0     4
    DAS20             READY         4     1     3
    DAS21             READY         4     1     3
    DAS22             READY         4     2     2
    DAS23             READY         4     0     4
    DAS24             READY         4     0     4
    DAS29             READY         12    4     8
    DAS31             PAUSED        12    0     12
    DAS32             PAUSED        12    0     12
    Tuesday, January 26, 2010 9:05 PM