locked
not able to assign templates to new nodes RRS feed

  • Question

  • When I try to assign a template to a new node we are getting this error on the gui interface and these errors in the management logs. At first I thought it was a misconfigured network adapter. But when I do an ipconfig /all I dont see the the 169 addresses at all, only the normal static address. I also checked device manager and all I dont see any unknown interfaces that could cause this. We are using teamed network interfaces and the static ip is assigned to the virtual interface. I've rebooted the head node and it still is giving me these errors. Any ideas?

    Thanks,

    Nicki

    This is edited to remove the hostname and the real ip address

    Time Message
    6/7/2017 11:19:41 AM Could not contact node '<headnode>' to perform change. Connection Failed. A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 169.254.255.151:6730

    When I look at the management logs on the head node I see these errors:

    [ConnectionHelper] Resolved host <headnode> to IP 169.254.255.151  
    06/07/2017 20:25:44.697 i HpcManagement 5164 10184 [ConnectionHelper] Resolved host <headnode> to IP <real IP>  
    06/07/2017 20:25:44.697 i HpcManagement 5164 10184 [ConnectionHelper] Resolved host <headnode> to IP 169.254.250.225  
    06/07/2017 20:25:44.697 i HpcManagement 5164 10184 [ConnectionHelper] Resolved host <headnode> to IP 169.254.119.59  
    06/07/2017 20:25:44.697 i HpcManagement 5164 10184 [ConnectionHelper] Connecting to node manager service on host <headnode> with uri sdm://169.254.255.151:6730/Microsoft.Ccp.ComputeNodeConfiguration  
    06/07/2017 20:25:44.697 i HpcManagement 5164 10184 [ConnectionHelper] Connecting to node manager service on host <headnode> with uri sdm://<real ip>:6730/Microsoft.Ccp.ComputeNodeConfiguration  
    06/07/2017 20:25:44.697 i HpcManagement 5164 10184 [ConnectionHelper] Connecting to node manager service on host <headnode> with uri sdm://169.254.250.225:6730/Microsoft.Ccp.ComputeNodeConfiguration  
    06/07/2017 20:25:44.697 i HpcManagement 5164 10184 [ConnectionHelper] Connecting to node manager service on host <headnode> with uri sdm://169.254.119.59:6730/Microsoft.Ccp.ComputeNodeConfiguration  




    • Edited by nicka345 Wednesday, June 7, 2017 8:40 PM
    Wednesday, June 7, 2017 8:38 PM

Answers

  • This was caused by a hardware refresh and some of the old server information, specifically the old nic mac addresses were still in the external management database. When we did the refresh we ran the setup.exe -keepdata with the external database connection strings. Then we ran an import-hpcconfiguration. 

    In order to fix this I had to do a reinstall, here is what I did.

    I had to backup the database – uninstall the headnode. Then on the reinstall I used the gui (not the command line –keepdata) so it blows away the external databases. Then I ran the import-hpcconfiguration and restored all of the databases except the management database. This worked – the only problem with this was that I had to re- add all of the nodes. But we only have about 300 nodes and I already had a list of the groups that they were in – so it wasn’t too bad. 

    • Marked as answer by nicka345 Thursday, June 15, 2017 1:15 PM
    Thursday, June 15, 2017 1:15 PM

All replies

  • Hi Nicka,

    Can you ping <head node name> to see what ip address, and can you check hosts file? 

    Thursday, June 8, 2017 8:25 AM
  • well the /etc/hosts file on the headnode did not have the enterprise.hreadnode entry - I entered that. ping from the client goes to the correct headnode IP, nslookup is correct. it is still getting the same error. I checked the management logs on the client - it resolves the headnode to the correct name... so I really think it is something on the headnode... I just dont understand where it would be getting this 169 address, especially if it doesn't show up in ipconfig /all. 
    Thursday, June 8, 2017 2:16 PM
  • ok, as the management log

    [ConnectionHelper] Resolved host <headnode> to IP 169.254.255.151  
    06/07/2017 20:25:44.697 i HpcManagement 5164 10184 [ConnectionHelper] Resolved host <headnode> to IP <real IP>  
    06/07/2017 20:25:44.697 i HpcManagement 5164 10184 [ConnectionHelper] Resolved host <headnode> to IP 169.254.250.225  
    06/07/2017 20:25:44.697 i HpcManagement 5164 10184 [ConnectionHelper] Resolved host <headnode> to IP 169.254.119.59  
    06/07/2017 20:25:44.697

    it will try the resolve IP one by one, so what is the final status of assign node template operation?

    If possible, you can send the HpcManagement log on head node and that compute node to us through email hpcpack@microsoft.com

    Thursday, June 8, 2017 2:24 PM
  • Sent.

    Thanks,

    Nicki

    Thursday, June 8, 2017 3:57 PM
  • This was caused by a hardware refresh and some of the old server information, specifically the old nic mac addresses were still in the external management database. When we did the refresh we ran the setup.exe -keepdata with the external database connection strings. Then we ran an import-hpcconfiguration. 

    In order to fix this I had to do a reinstall, here is what I did.

    I had to backup the database – uninstall the headnode. Then on the reinstall I used the gui (not the command line –keepdata) so it blows away the external databases. Then I ran the import-hpcconfiguration and restored all of the databases except the management database. This worked – the only problem with this was that I had to re- add all of the nodes. But we only have about 300 nodes and I already had a list of the groups that they were in – so it wasn’t too bad. 

    • Marked as answer by nicka345 Thursday, June 15, 2017 1:15 PM
    Thursday, June 15, 2017 1:15 PM