none
Bad connectivity with domain controller RRS feed

  • Question

  • Hello,

    We have Windows HPC Server 2008 cluster and experience the following problem. Cluster is located in place with unstable Internet so it periodically loses connection with Domain Controller. The duration of such failures is about 10 minutes. After losing the connection, new jobs on cluster queue become failed with errors like this:

    Failed to start on node BATNOVCL1N4.  Error: There are currently no logon servers available to service the logon requestException of type 'Microsoft.Hpc.Activation.NodeManagerException' was thrown.

    On cluster log there are following errors:
    Event Type:    Error
    Event Source:    NETLOGON
    Event Category:    None
    Event ID:    5719
    Date:        31.01.2010
    Time:        11:53:18
    User:        N/A
    Computer:    BATNOVSRV01.*******
    Description:
    No Domain Controller is available for domain ***** due to the following:
    There are currently no logon servers available to service the logon request. .
    Make sure that the computer is connected to the network and try again. If the problem persists, please contact your domain administrator.

    Event Type:    Warning
    Event Source:    HpcScheduler
    Event Category:    None
    Event ID:    0
    Date:        31.01.2010
    Time:        12:22:31
    User:        N/A
    Computer:    BATNOVSRV01.******
    Description:
    [RC] Start job and task 59403.176029 failed on node BATNOVCL1N1. Error: There are currently no logon servers available to service the logon requestException of type 'Microsoft.Hpc.Activation.NodeManagerException' was thrown.

    Sometimes even running jobs are failed with error:

    Event Type:    Warning
    Event Source:    HpcScheduler
    Event Category:    None
    Event ID:    0
    Date:        31.01.2010
    Time:        16:02:28
    User:        N/A
    Computer:    BATNOVSRV01.*******
    Description:
    [RC] End job 43824 failed on node BATNOVCL1N8. Error: A call to SSPI failed, see inner exception.

    What can you recommend in such situation? Are there any ways to lower security authetifications and thus decrease attempts to connect to domain controller?





    Wednesday, February 3, 2010 7:09 AM

Answers

  • Hi Nikita
    HPC server relies heavily on AD, so if you have intermittent domain controller access issues you will always see job submission / running problems (amongst other management and submission operations which you may not have come across yet). My suggestion in this scenario is if at all possible install a replica of the hosting domain at your site. Of course this may not be an option for a number of reasons, but a Read Only Domain Controller can help work around operational (and political) constraints.
    Regards
    Dan
    • Marked as answer by Nikita Tropin Monday, February 15, 2010 10:31 AM
    Wednesday, February 3, 2010 12:15 PM

All replies

  • Hi Nikita
    HPC server relies heavily on AD, so if you have intermittent domain controller access issues you will always see job submission / running problems (amongst other management and submission operations which you may not have come across yet). My suggestion in this scenario is if at all possible install a replica of the hosting domain at your site. Of course this may not be an option for a number of reasons, but a Read Only Domain Controller can help work around operational (and political) constraints.
    Regards
    Dan
    • Marked as answer by Nikita Tropin Monday, February 15, 2010 10:31 AM
    Wednesday, February 3, 2010 12:15 PM
  • Dan,

    Can I install Read Only Domain Controller on my cluster headnode with Windows HPC Server 2008? Or I need some other edition of Windows Server 2008?
    Saturday, February 6, 2010 7:15 AM
  • Hey Nikita
    The simple answer is yes, you can use your headnode as a domain controller.
    Dan
    Monday, February 8, 2010 10:56 AM