Attempting to logon to Head Node through WCF throws [There are currently no logon servers available to service the logon request] after a few days RRS feed

  • Question

  • I have an HPC cluster 2016 with a single head node running on Windows Server 2012 R2, and it needs to be restarted every few days and clients report the following error: Microsoft.Hpc.Scheduler.Session.SessionException: Failed to get cluster property. The scheduler raised exception: System.ServiceModel.FaultException`1[Microsoft.Hpc.ExceptionWrapper]: There are currently no logon servers available to service the logon request.

    After restarting the server, things seem OK, but it's not great that we are unable to create sessions infrequently. After digging into the Event Viewer, the Head Node starts raising event 4227 [TCP/IP failed to establish an outgoing connection because the selected local endpoint was recently used to connect to the same remote endpoint. This error typically occurs when outgoing connections are opened and closed at a high rate, causing all available local ports to be used and forcing TCP/IP to reuse a local port for an outgoing connection. To minimize the risk of data corruption, the TCP/IP standard requires a minimum time period to elapse between successive connections from a given local endpoint to a given remote endpoint.] After this warning gets raised, the head node appears to be unable to connect to the domain controller, even though the domain controllers are up and responsive. Technet recommends to decrease teh TcpTimeWaitDelay [https://social.technet.microsoft.com/Forums/ie/en-US/b632acdc-a546-4014-a299-4c27781e6c5a/tcpip-failed-to-establish-an-outgoing-connection-event-id-4227?forum=winserverPN] but I would like to know if that is recommended or if there are specific OS patches that are needed to have this issue go away.

    • Edited by KB_apl Thursday, August 29, 2019 5:16 PM provide OS info
    Thursday, August 29, 2019 4:45 PM

All replies

  • I just realized that the version of the HPC Server that I am using is 5.1.6086.0. I will update it to 5.1.6114.0
    Thursday, August 29, 2019 10:33 PM
  • Hi KB_apl,

    We have a known issue in HPC Pack 2016 Update 1/2 for TCP port leak on head node. We fixed it in HPC Pack 2016 Update 3 (5.3.6435.0), please upgrade to this version if possible.


    Yutong Sun

    Saturday, August 31, 2019 7:45 AM
  • Hi Yutong,

    I attempted to upgrade my single head node cluster from 5.1.6114.0 to 5.3.6435.0 and now all of my SOA jobs are failing. I followed the single node instructions for the Update 3 page, and when I send SOA requests, I get the following errors from my clients (first one I enabled IncludeExceptionDetailInFaults, second I did not):

    System.ServiceModel.FaultException`1[System.ServiceModel.ExceptionDetail]: Could not load type 'Microsoft.Hpc.Scheduler.Session.SessionFault' from assembly 'Microsoft.Hpc.Scheduler.Session, Version=, Culture=neutral, PublicKeyToken=31bf3856ad364e35'. (Fault Detail is equal to An ExceptionDetail, likely created by IncludeExceptionDetailInFaults=true, whose value is:
    System.TypeLoadException: Could not load type 'Microsoft.Hpc.Scheduler.Session.SessionFault' from assembly 'Microsoft.Hpc.Scheduler.Session, Version=, Culture=neutral, PublicKeyToken=31bf3856ad364e35'.
       at Microsoft.Hpc.CcpServiceHosting.TraceServiceBehavior.BeforeSendReply(Message& reply, Object correlationState)
       at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.BeforeSendReplyCore(MessageRpc& rpc, Exception& exception, Boolean& thereIsAnUnhandledException))."

    System.ServiceModel.FaultException: The server was unable to process the request due to an internal error.  For more information about the error, either turn on IncludeExceptionDetailInFaults (either from ServiceBehaviorAttribute or from the <serviceDebug> configuration behavior) on the server in order to send the exception information back to the client, or turn on tracing as per the Microsoft .NET Framework SDK documentation and inspect the server trace logs.
       at Microsoft.Hpc.Scheduler.Session.BrokerResponse`1.ThrowIfFaultUnderstood(Message response, MessageFault fault, String action, MessageVersion version)
       at Microsoft.Hpc.Scheduler.Session.BrokerResponse`1.get_Result()

    I updated the HPC SDK to 5.3.6437 on both the client side and service running on the cluster, and that made the SOA calls succeed, but is this a known requirement that all clients and SOA services need to be updated when updating the HPC Cluster version?

    I am unable to view SOA Message Details, I am currently getting an error: Error loading message list for your SOA message. Please try again later.

    Even with updating the SDK across the board, I am still unable to view the SOA messages from my cluster.

    I checked the Event Viewer and the log files under C:\Program Files\Microsoft HPC Pack 2016\Data\LogFiles but nothing is standing out. What kind of additional information should I provide for diagnosing this issue?

    Wednesday, September 11, 2019 11:00 PM