none
HPC Cluster Manager connection Problems (HPC Pack 2016 Update 3) RRS feed

  • Question

  • Hello,

    our cluster has now Windows HPC Pack 2016 Update 3. We are able to run the HPC Cluster Manager from the headnode and from the compute and workstation nodes (remotely connecting to the headnode) without any issues.

    On other computers which are not part of the cluster and not domain joined, the client utilities have been installed but trying to connect to the headnode through the HPC Cluster Manager throws an error. The HPC Job Manager has no issues to connect.

    The connection to the management serice failed. detail error: Microsoft.Hpc.etryCountExhaustException: Retry Count of RetryManager is exhausted. ---> System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel. ---> System.Security.Authentication.AuthenticationException: The remote certificate is invalid according to the validation procedure.

    When we change the registry keys described here: Manage Certificates for HPC Pack 2016 Update 2 or later version Cluster for "Certificate on Client: If you want to manage the cluster from a remote client, you need import the cert to LocalMachine\My with private key and to CurrentUser\Rootwith public key, then specify the thumbprint in below reg key 
    HKLM\SOFTWARE\Microsoft\HPC\0386B1198B956BBAAA4154153B6CA1F44B6D1016 = <thumbprint here>"

    the error also pops up. When we set a reg key like on the compute and workstation nodes "HKLM\SOFTWARE\Microsoft\HPC\SSLThumbPrint = <thumbprint here>" and change HKLM\SOFTWARE\Microsoft\HPC\CertificateValidationType = 0 or 1 we can connect to headnode with the HPC Job and Cluster Manager. But when we create a new job the job owner is "NT AUTHORITY\SYSTEM" and the job can't run.

    It seems that both, the HPC Cluster and Job Manager ignores then the user for headnode in credential manager under windows credentials.

    Any suggestion on how to solve this?

    Thanks,

    Thomas

    Wednesday, August 14, 2019 7:23 AM

All replies

  • Hi Thomas,

    I don't suppose this is a supported scenario for non-domain joined client to use HPC Cluster Manager connecting to a domain joined head node. In this case, only HPC Job Manager may connect to the head node via HTTPs and submit jobs with domain username/password.

    Can you just join the client into the same domain of the head node?

    Regards,

    Yutong Sun 

    Tuesday, August 20, 2019 7:46 AM
    Moderator