Answered by:
Second failover headnode install fails - cannot start services

Question
-
I have created a failover cluster with two servers and a file server with HPC Pack 2012 R2 4.5.5079.0
The first headnode installed OK - all services in Cluster Manager show online, and headnode shows as online in HPC Cluster Manager
When installing the second headnode I select "Add a new headnode to an existing failover cluster" and follow the High Availability steps.
The installer runs and fails at "start services"
The HPCManagement service appears to not start and I see this error in event logs under HPC/Management/Admin:
"Connection Failed. A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond [static_IP_of_my_failover_fileserver]:9893"I cannot telnet to that IP and port from the second headnode (should I be able to?)
I can only telnet that IP and port from the first headnode, which is the active node in the cluster.
Any ideas?
Thanks
Tim
PS I also tried uninstalling HPC headnode on both headnodes, failing over to WSFC to the inactive second node and repeating. Same result.- Edited by TimJRoberts1 Tuesday, March 29, 2016 3:12 PM
Tuesday, March 29, 2016 2:44 PM
Answers
-
One thing I had not mentioned was that this is running in AWS. The solution was actually to add the cluster's IP address as secondary private IP addresses via AWS API to both headnodes. It now all works, I have two headnodes in a cluster.
Thanks
Tim
- Proposed as answer by qiufang shiMicrosoft employee Friday, April 1, 2016 10:21 AM
- Marked as answer by TimJRoberts1 Friday, April 1, 2016 1:43 PM
Friday, April 1, 2016 7:15 AM
All replies
-
Hi Tim,
It seems the firewall in the active head node blocks the incoming connection from the secondary head node.
In fact, during the installation of the active head node, some firewall rules will be automatically added and enabled, Can you check the firewall setting on your active head node? The firewall rules added by HPC Pack installation are named with leading "HPC". The port 9893 shall be allowed in the rule "HPC SDM Store Service (TCP-In)".
And please pack the setup log under C:\Windows\Temp\HPCSetupLogs for both active head node and secondary head node, and send to me suzhu@microsoft.com
Thanks,
Sunbin
Wednesday, March 30, 2016 2:36 AM -
One thing I had not mentioned was that this is running in AWS. The solution was actually to add the cluster's IP address as secondary private IP addresses via AWS API to both headnodes. It now all works, I have two headnodes in a cluster.
Thanks
Tim
- Proposed as answer by qiufang shiMicrosoft employee Friday, April 1, 2016 10:21 AM
- Marked as answer by TimJRoberts1 Friday, April 1, 2016 1:43 PM
Friday, April 1, 2016 7:15 AM -
Tim, great to hear that it all works for you now.
Qiufang Shi
Friday, April 1, 2016 10:21 AM