none
Security Cryptography X509 Certificates error not allowing HPC Management Service to start

    Question

  • HPC Pack 2016 running on Windows Server 2012 R2

     

    My head node health states Error.

    Node Connectivity: HPC Management Service or Node Manager Service unreachable. 

    HPC Node Manager Service wont start. Error 1067: The process terminated unexpectedly

    Output from Windows Event Logs:

    Application: HpcNodeManager.exe
    Framework Version: v4.0.30319
    Description: The process was terminated due to an unhandled exception.
    Exception Info: System.InvalidOperationException
       at System.ServiceModel.Security.SecurityUtils.GetCertificateFromStoreCore(System.Security.Cryptography.X509Certificates.StoreName, System.Security.Cryptography.X509Certificates.StoreLocation, System.Security.Cryptography.X509Certificates.X509FindType, System.Object, System.ServiceModel.EndpointAddress, Boolean)
       at System.ServiceModel.Security.SecurityUtils.GetCertificateFromStore(System.Security.Cryptography.X509Certificates.StoreName, System.Security.Cryptography.X509Certificates.StoreLocation, System.Security.Cryptography.X509Certificates.X509FindType, System.Object, System.ServiceModel.EndpointAddress)
       at System.ServiceModel.Security.X509CertificateRecipientServiceCredential.SetCertificate(System.Security.Cryptography.X509Certificates.StoreLocation, System.Security.Cryptography.X509Certificates.StoreName, System.Security.Cryptography.X509Certificates.X509FindType, System.Object)
       at Microsoft.Hpc.WcfChannelModule.CreateWcfChannel[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](System.Object, System.ServiceModel.Channels.Binding, System.ServiceModel.ServiceAuthorizationManager, System.ServiceModel.Description.PrincipalPermissionMode, System.ServiceModel.Description.ServiceThrottlingBehavior, System.String, System.String)
       at Microsoft.Hpc.WcfChannelModule+<SetupInternalWcfChannelAsync>d__12`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)
       at Microsoft.Hpc.NodeManager.RemotingCommunicator.RemotingNMCommImpl+<SetupWCFChannel>d__49.MoveNext()
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)
       at Microsoft.Hpc.NodeManager.RemotingCommunicator.RemotingNMCommImpl.StartServiceAndHeartbeat()
       at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
       at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
       at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
       at System.Threading.ThreadHelper.ThreadStart()


    • Edited by BruceIsHere Tuesday, November 14, 2017 1:27 AM Changed the service name
    Tuesday, November 14, 2017 1:26 AM

All replies

  • Hi,

    Was this cluster working before or HPC Pack was just installed?

    This is likely because node manager can't find corresponding cert. Could you have a check if the cert you specified in setup is actually existing in LocalMachine\My? If yes, you can find nodemanager logs which named like HpcNodeManager_000000.bin at %ccp_home%\Data\LogFiles\Scheduler and

    • Download logviewer here and use it to inspect those logs, or
    • Package these logs and send to hpcpack@microsoft.com

    Thansk,
    Zihao

    Thursday, November 16, 2017 7:27 AM