none
Second failover headnode install fails - cannot start services

    Question

  • I have created a failover cluster with two servers and a file server with HPC Pack 2012 R2 4.5.5079.0

    The first headnode installed OK - all services in Cluster Manager show online, and headnode shows as online in HPC Cluster Manager

    When installing the second headnode I select "Add a new headnode to an existing failover cluster" and follow the High Availability steps.

    The installer runs and fails at "start services"

    The HPCManagement service appears to not start and I see this error in event logs under HPC/Management/Admin:

    "Connection Failed. A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond [static_IP_of_my_failover_fileserver]:9893"

    I cannot telnet to that IP and port from the second headnode (should I be able to?)

    I can only telnet that IP and port from the first headnode, which is the active node in the cluster.

    Any ideas?

    Thanks

    Tim

    PS I also tried uninstalling HPC headnode on both headnodes, failing over to WSFC to the inactive second node and repeating. Same result.
    Tuesday, March 29, 2016 2:44 PM

Answers

  • One thing I had not mentioned was that this is running in AWS. The solution was actually to add the cluster's IP address as secondary private IP addresses via AWS API to both headnodes. It now all works, I have two headnodes in a cluster.

    Thanks

    Tim

    Friday, April 1, 2016 7:15 AM

All replies

  • Hi Tim,

    It seems the firewall in the active head node blocks the incoming connection from the secondary head node.

    In fact, during the installation of the active head node, some firewall rules will be automatically added and enabled, Can you check the firewall setting on your active head node? The firewall rules added by HPC Pack installation are named with leading "HPC". The port 9893 shall be allowed in the rule "HPC SDM Store Service (TCP-In)".

    And please pack the setup log under C:\Windows\Temp\HPCSetupLogs for both active head node and secondary head node, and send to me suzhu@microsoft.com

    Thanks,

    Sunbin

    Wednesday, March 30, 2016 2:36 AM
  • One thing I had not mentioned was that this is running in AWS. The solution was actually to add the cluster's IP address as secondary private IP addresses via AWS API to both headnodes. It now all works, I have two headnodes in a cluster.

    Thanks

    Tim

    Friday, April 1, 2016 7:15 AM
  • Tim, great to hear that it all works for you now.


    Qiufang Shi

    Friday, April 1, 2016 10:21 AM