none
Windows HPC Multiple Head Nodes

    Question

  • Dear all,

    We are testing Windows HPC Pack 2016 and would like to know more about the maximum number of head nodes our cluster can have. Please help us with the following questions:

    1.  What's the maximum size of the cluster a single head node can handle?

    2. How many head nodes a cluster can have? If I have five head nodes, will all of them participate in the job scheduling process.

    3. Is a failover head node an active or passive node? Does it works only when the primary head node go down or always be active and share the load.

    4. If I have multiple head nodes, how the load balancing happens between them.

    Thanks a lot in advance.

    - Puneet


    Puneet Sharma

    Wednesday, March 08, 2017 11:03 PM

Answers

  • 1. It depends on your headnode HW spec, higher RAM and CPU and SQL should have no problem to handle a big cluster.

    2. Currently we only support 3 Headnode for simplicity, we will support more headnodes in future release

    3. Most of the service are active/passive, for example: scheduler, monitoring, SDM, stateful management service, session; and some are stateless (one active instance on every node), for example: stateless management service, REST service, Naming Service

    4. Currently we disabled load balancing, admin can move the service around/between the three node manually through Service Fabric Powershell tool


    Qiufang Shi

    Thursday, March 09, 2017 1:20 AM

All replies

  • 1. It depends on your headnode HW spec, higher RAM and CPU and SQL should have no problem to handle a big cluster.

    2. Currently we only support 3 Headnode for simplicity, we will support more headnodes in future release

    3. Most of the service are active/passive, for example: scheduler, monitoring, SDM, stateful management service, session; and some are stateless (one active instance on every node), for example: stateless management service, REST service, Naming Service

    4. Currently we disabled load balancing, admin can move the service around/between the three node manually through Service Fabric Powershell tool


    Qiufang Shi

    Thursday, March 09, 2017 1:20 AM
  • Dear Qiufang,

    From your reply, I understood that even though we have 3 head nodes, only one head node will be active (submitting jobs, monitoring etc) at a given point in time. Other two head nodes will not be doing anything. Is it correct? 

    Actually based on our requirements, we want to add multiple active head nodes if cluster size grows. I believe we cannot do it with current HPC. Is is Correct?

    Thanks,

    Puneet



    Puneet Sharma


    Wednesday, March 15, 2017 9:41 PM
  • Hi Puneet,

      Most Your understanding is correct. And from our testing, SQL will be easily being the bottleneck. A few clarification:

    1. Currently 3 Headnodes only for simplicity, we will support more in future

    2. Other two headnodes actually are doing things as different services may active on different headnodes, it includes services:

        - Scheduler Service: scheduling jobs

        - Monitoring Service: receive metric data from all compute nodes and do aggregation

        - Diagnostics Servcie

        - Stateful management service: Do management operations

        - Session Service: serve SOA job

    And meanwhile there are stateless service that running on all headnodes and being active all the time, it includes services:

        - Stateless Management Service: serve client request and client triggered operations such as node offline/online ...

        - Scheduler REST service: serve client job request

        - Naming Service

    As we have disabled loading balancing, the service only failover when it encounters error. THus you are able to mannually distribute primary instance of a service on different headnode for example: Primary Scheduler service on one headnode, other primary services on the other two to have better load balance between the three headnodes


    Qiufang Shi

    Thursday, March 16, 2017 2:14 AM