none
domain rebuilt, nodes showing multiple states

    Question

  • Through a comedy of errors, we had to rebuild our entire domain. Now our HPC managers are having issues recognizing/connecting to all the nodes. Nodes are showing up both as Error and Unapproved. I suspect the Errors are the original domain accounts and the unapproved are the new ones. Unfortunately, they seem to be confusing the server as I'm unable to do anything to remove either set and ending up with circular error messages:

    PS C:\Windows\system32> set-hpcnodestate -force -state offline -name dev-076
    WARNING: NOTE: All tasks that are running on the selected nodes are being canceled and requeued.
    set-hpcnodestate : The node DEV-076 could not be taken offline because it is not currently part of the cluster
    (its state is Unknown).

    PS C:\Windows\system32> Remove-HpcNode -name dev-076
    WARNING: Skipped node DEV-076. Only nodes in the Offline, Unknown or Rejected states can be removed.

    Other than rebuilding the server, is there anything we can do to get out of this mess?

    Monday, March 21, 2016 4:27 PM

All replies

  • Hi, Jason,

    what do you mean rebuild the entire domain? re-install the domain controller, or the domain name is changed, and all HPC nodes(head nodes and compute nodes) join the new domain?

    if the domain name is changes, we suggest you reinstall HPC head node, then all compute nodes should be automatically added into HPC cluster as Unapproved, then you can assign node template.

    Thanks,

    Yongjun

    Tuesday, March 22, 2016 1:44 AM