none
Certificate exception when attempting to connect to HPC cluster

    Question

  • I am using version 5.1.6088 of the HPC SDK to connect to a 2016 Update 1 version of the HPC Pack. Some client machines are able to connect to the cluster, but some throw the following exception when attempting to create a session:

    System.ArgumentNullException: Value cannot be null.
    Parameter name: findValue
       at System.Security.Cryptography.X509Certificates.X509Certificate2Collection.FindCertInStore(SafeCertStoreHandle safeSourceStoreHandle, X509FindType findType, Object findValue, Boolean validOnly)
       at System.Security.Cryptography.X509Certificates.X509Certificate2Collection.Find(X509FindType findType, Object findValue, Boolean validOnly)
       at Microsoft.Hpc.WcfChannelModule.GetCertDnsIdentityName(String thumbPrint, StoreName storeName, StoreLocation storeLocation)
       at Microsoft.Hpc.Scheduler.Session.Internal.SoaHelper.CreateEndpointAddress(Uri uri, Boolean secure, Boolean certIdentity)
       at Microsoft.Hpc.Scheduler.Session.CredUtil.<GetCredTypeFromClusterAsync>d__4.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.Scheduler.Session.CredUtil.<GetCredTypeFromClusterAsync>d__3.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.Scheduler.Session.Internal.OnPremiseSessionFactory.<CreateSession>d__0.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.Scheduler.Session.V3Session.<CreateSessionAsync>d__16.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.Scheduler.Session.Session.<CreateSessionAsync>d__25.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.Scheduler.Session.Session.CreateSession(SessionStartInfo startInfo, Binding binding)

    The binding has been set to the default, and the SessionStartInfo object has a username and password assigned to it. Tried installing the HPC certificate on the client machine encountering the exception, but that does not seem to resolve it. The HPC Cluster is joined to a domain, but the client using the SDK is not joined to the domain. 

    Wednesday, January 17, 2018 9:27 PM

All replies

  • Hi,

    The certificate is used in HPC Pack AAD integration support. As you are not using AAD integration, you do not need install any certificate to make your client work.

    Are you trying to connect to session service using net.tcp binding from a non-domain joined client machine? I believe this is not the scenario we support. Please consider using http binding when creating sessions instead.

    Thanks,
    Zihao

    Thursday, January 18, 2018 5:03 AM
  • I changed the TransportScheme from NetTcp to Http, and I am still seeing the previous error. I verified that the Binding that was used was an HttpBinding. Additionally, for machines within the domain, the jobs that fail never appear to return any responses from SOA requests. The requests all report as Failed in the Cluster Manager, but the caller is stuck in attempting to iterate the GetResponses call.
    Thursday, January 18, 2018 5:46 PM
  • Hi,

    Thank you for your feedback. For the first issue, we'll check if this is an SDK regression, and inform you once we have done the check and fix.

    For the second issue, could you share us the

    1. Brokerworker logs, located in %CCP_LOGROOT_SYS%SOA\HpcBrokerWorker_*.bin on head node.
    2. Session logs in %CCP_LOGROOT_SYS%SOA\HpcSession_*.bin on head node.

    You can send the logs to hpcpack@microsoft.com.

    Thanks,
    Zihao

    Friday, January 19, 2018 2:35 AM
  • Hi,

    Thank you for your patience.

    Connecting an on-premise domain joined cluster from non-domain joined client using Net.Tcp is not a scenario we support now. We are considering about adding this feature in future release. While using http binding works in our test. Could you just set SessionStartInfo.TransportScheme to TransportScheme.Http and retry? If it is still not working, please collect client logs per our previous email and sent to us.

    Thank,
    Zihao

    Thursday, January 25, 2018 7:05 AM