none
Adding Workstation nodes over private lan RRS feed

  • Question

  • I'm setting up a new HPC cluster using HPC pack 2016, and have a server built and configured as a head node, and HPC installed and validated.  Network topology is '1 - computer nodes isolated on a private network'. 

    I have a bunch of windows 10 machines on the private network that i want to add as workstation nodes, however despite installing HPC pack fine, i am unable to get them talking to the head node. The head node never discovers them when i try and add them in cluster manager, and if i try and fire up cluster manager on each node and connect back to the head node, i get an ssl error: "The remote certificate is invalid according to the validation procedure."

    The certificate is just a self signed one created on the head node, and was successfully imported in to the workstation nodes when hpc pack was installed, so i am a little stumped. Does anyone know what i'm doing wrong?

    (also, all firewalls are set to off for testing so nothing should be blocked)

    thanks.

    Wednesday, April 8, 2020 4:07 PM

Answers

  • Hi RetsofD,

    Non-domain joined workstation node is simply NOT supported in HPC Pack, neither in on-premise cluster nor in Azure cluster.

    We only support non-domain joined "compute node".

    If your machine is running Windows Server OS (e.g. Windows Server 2016, Windows Server 2012 R2 ..) and not domain joined, you can install "HPC compute node" on it, the HPC setup application will detect the machine is not domain joined, and then install it as "non-domain joined compute node".

    If your machine is running Windows Client OS (e.g. Windows 10, Windows 8.1) and not domain joined, you are NOT able to install "HPC workstation node" on it.

    • Marked as answer by RetsofD Monday, April 27, 2020 8:48 AM
    • Unmarked as answer by RetsofD Thursday, May 7, 2020 3:10 PM
    • Marked as answer by RetsofD Tuesday, May 12, 2020 3:51 PM
    Sunday, April 26, 2020 10:29 AM
  • Hi,

    I guess you are using different certificate on the head node and the work station node.

    Could you open regedit.exe on the head node and workstation node, go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\HPC, and check whether "SSLThumbprint" is with the same value?

    If not, then you are using different self-signed certificate, the reason maybe you didn't install \\headnode\REMINST\Certificates\HpcHnPublicCert.cer in the certificate store "Local Computer\Trusted Root Certification Authorities" on the workstation node. 

    • Marked as answer by RetsofD Tuesday, May 12, 2020 3:50 PM
    Saturday, May 9, 2020 9:17 AM

All replies

  • It might also be worth mentioning that the Windows 10 workstations are not in a domain. Not sure if that has any bearing.
    Wednesday, April 8, 2020 4:22 PM
  • 

    Hi RetsofD,

    It is not supported to have head node in a domain but workstation nodes not in a domain. You may need to join the workstation in the domain first before joining them to the head node. The non-domain joined cluster is designed for Azure deployment. It is not recommend for on premise clusters.

    And also it is not supported to open HPC Cluster Manager from on a non-domain joined client to domain joined cluster.

    Regards,

    Yutong Sun

    Friday, April 17, 2020 2:48 PM
    Moderator
  • Thanks Yutong,

    That's pretty annoying, i didn't see that specified anywhere that you cant use non domain workstations... is that the case for all topologies?

    Having the workstations on the domain would be difficult, is there any other way i can add them?

    Tuesday, April 21, 2020 2:15 PM
  • Hi RetsofD,

    Non-domain joined workstation nodes or unmanaged server nodes are not supported. It's for all topologies, and for both on premise and Azure.

    It is only feasible for you to try non-domain joined compute nodes for on premise clusters besides Azure. Just make sure the DNS are properly configured (or add the head node IP directly in the hosts file on the compute nodes) so that the non-domain joined compute nodes can connect to the head node. They will pop up automatically in the non-domain compute node template and group. Jobs running on them are under local users with the same name as the domain user.

    Cheers,

    Yutong Sun

    Thursday, April 23, 2020 3:40 AM
    Moderator
  • Hi Yutong,

    just to clarify, when you say 'besides Azure', do you mean when not using Azure?

    My entire cluster is on premise, and there is no integration with Azure or any other cloud solution.

    I have tried adding the head node IP in to the Hosts file on each workstation node, and they can ping the head node using the name, so no issues with DNS i think, however they still do not join the cluster. 

    many thanks

    Thursday, April 23, 2020 8:31 AM
  • Hi RetsofD,

    Non-domain joined workstation node is simply NOT supported in HPC Pack, neither in on-premise cluster nor in Azure cluster.

    We only support non-domain joined "compute node".

    If your machine is running Windows Server OS (e.g. Windows Server 2016, Windows Server 2012 R2 ..) and not domain joined, you can install "HPC compute node" on it, the HPC setup application will detect the machine is not domain joined, and then install it as "non-domain joined compute node".

    If your machine is running Windows Client OS (e.g. Windows 10, Windows 8.1) and not domain joined, you are NOT able to install "HPC workstation node" on it.

    • Marked as answer by RetsofD Monday, April 27, 2020 8:48 AM
    • Unmarked as answer by RetsofD Thursday, May 7, 2020 3:10 PM
    • Marked as answer by RetsofD Tuesday, May 12, 2020 3:51 PM
    Sunday, April 26, 2020 10:29 AM
  • OK thanks Sunbin Zhu,

    I will have to figure out a way of getting them on the domain, many thanks for your clarification.

    Monday, April 27, 2020 8:50 AM
  • Hi Sunbin Zhu,

    I have now added one of the workstations to the domain, however i get the exact same issue when trying to add it as a workstation node.

    It doesn't show up in cluster manager on the head node when i add it, so cannot assign the template to it.  If i try and open cluster manager on the client, it has the same message regarding SSL/TLS.

    any thoughts?

    Thursday, May 7, 2020 3:59 PM
  • Hi,

    I guess you are using different certificate on the head node and the work station node.

    Could you open regedit.exe on the head node and workstation node, go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\HPC, and check whether "SSLThumbprint" is with the same value?

    If not, then you are using different self-signed certificate, the reason maybe you didn't install \\headnode\REMINST\Certificates\HpcHnPublicCert.cer in the certificate store "Local Computer\Trusted Root Certification Authorities" on the workstation node. 

    • Marked as answer by RetsofD Tuesday, May 12, 2020 3:50 PM
    Saturday, May 9, 2020 9:17 AM
  • The communication certificate was the correct one, but i hadn't installed the HpcHnPublicCert.cer, so when i add that, it's now letting me add the nodes!

    So a combination of adding the workstation nodes to the doamin, and installing the extra certificate fixed it.

    Many thanks for the help!


    • Edited by RetsofD Tuesday, May 12, 2020 3:53 PM
    Tuesday, May 12, 2020 3:51 PM