none
Windows Server 2008 R2域服务器定期死机 RRS feed

  • 问题

  • 这边网络有两台Windows Server 2008 R2域控制服务器(DC01主域和DC02额外域)在最近一个月以来,每次相隔不超过2天就会无法登录,发生故障时是两台同时无法登录,无论是远程RDP还是本地都一样,两台服务器均是同一时间安装和配置的(2010-11-6安装的),使用DELL R310的硬件(X3450/4G/500GB*4 with Raid10),已按以下步骤排查故障:

    1、检查服务器硬件报警,所有设备显示均正常,并且安装了DELL随服务器的System Administrator,所有状态显示正常;可以排除非硬件问题引起,并且两台服务器同时发生硬件故障的机率甚小;

    2、更新服务器最新补丁,包括SP1和其它相关补丁;

    3、检查事件日志,至今为止我仍未找出关键致命性日志;后文付上;

    4、由于有DNS是日志错误,我试图重新配置了域名的Zone,但仍未解决问题;

    5、主域和额外域之间的角色互相调换过(由DC01的FSMO角色转移至DC02),问题仍在发生。

    • 已编辑 LingPing 2012年8月31日 4:19
    2012年8月31日 4:17

答案

  • 您好!                         

    由于造成计算机死机的原因较多,我们建议您尝试以下步骤进行排错:

    1. 请您尝试做一次Clean Boot

    a. 运行MSCONFIG

    b. 在常规下选择 选择性启动

    c. 然后清除Process System.ini File, Process Win.ini FileLoad Startup Items 的复选框,但是保留使用原始Boot.ini

    d. 在服务下,先点隐藏所有windows 服务,然后选择 disable all.

    e. 然后重新启动。观察问题是否依旧发生。

    关于Clean boot的详细步骤请您查看:

    http://support.microsoft.com/kb/310353/zh-cn

    另外,我还建议您尝试以下操作:

    a. 从光盘启动计算机,启动故障恢复控制台,然后使用 Chkdsk 命令行实用工具,确定硬盘或文件系统未损坏。

    b. 进入微软网站,更新所有系统补丁。


    希望我的回答对您有所帮助,如果您还有什么问题,请您再和我们联系。


    如果您对我们的论坛在线支持服务有任何的意见或建议,请通过邮件告诉我们。
    Description: Description: TechNet 论坛好帮手立刻免费下载  TechNet 论坛好帮手

    2012年8月31日 5:57
    版主

全部回复

  • dcdiag 导出信息

    Directory Server Diagnosis
    
    
    Performing initial setup:
    
       Trying to find home server...
    
       Home Server = RTDC01
    
       * Identified AD Forest. 
       Done gathering initial info.
    
    
    Doing initial required tests
    
       
       Testing server: Default-First-Site-Name\RTDC01
    
          Starting test: Connectivity
    
             ......................... RTDC01 passed test Connectivity
    
    
    
    Doing primary tests
    
       
       Testing server: Default-First-Site-Name\RTDC01
    
          Starting test: Advertising
    
             ......................... RTDC01 passed test Advertising
    
          Starting test: FrsEvent
    
             ......................... RTDC01 passed test FrsEvent
    
          Starting test: DFSREvent
    
             There are warning or error events within the last 24 hours after the
    
             SYSVOL has been shared.  Failing SYSVOL replication problems may cause
    
             Group Policy problems. 
             ......................... RTDC01 failed test DFSREvent
    
          Starting test: SysVolCheck
    
             ......................... RTDC01 passed test SysVolCheck
    
          Starting test: KccEvent
    
             ......................... RTDC01 passed test KccEvent
    
          Starting test: KnowsOfRoleHolders
    
             ......................... RTDC01 passed test KnowsOfRoleHolders
    
          Starting test: MachineAccount
    
             ......................... RTDC01 passed test MachineAccount
    
          Starting test: NCSecDesc
    
             ......................... RTDC01 passed test NCSecDesc
    
          Starting test: NetLogons
    
             ......................... RTDC01 passed test NetLogons
    
          Starting test: ObjectsReplicated
    
             ......................... RTDC01 passed test ObjectsReplicated
    
          Starting test: Replications
    
             ......................... RTDC01 passed test Replications
    
          Starting test: RidManager
    
             ......................... RTDC01 passed test RidManager
    
          Starting test: Services
    
             ......................... RTDC01 passed test Services
    
          Starting test: SystemLog
    
             A warning event occurred.  EventID: 0x000003FC
    
                Time Generated: 08/31/2012   13:23:39
    
                Event String:
    
                Scope, 192.168.70.0, is 96 percent full with only 1 IP addresses remaining.
    
             ......................... RTDC01 passed test SystemLog
    
          Starting test: VerifyReferences
    
             ......................... RTDC01 passed test VerifyReferences
    
       
       
       Running partition tests on : ForestDnsZones
    
          Starting test: CheckSDRefDom
    
             ......................... ForestDnsZones passed test CheckSDRefDom
    
          Starting test: CrossRefValidation
    
             ......................... ForestDnsZones passed test
    
             CrossRefValidation
    
       
       Running partition tests on : DomainDnsZones
    
          Starting test: CheckSDRefDom
    
             ......................... DomainDnsZones passed test CheckSDRefDom
    
          Starting test: CrossRefValidation
    
             ......................... DomainDnsZones passed test
    
             CrossRefValidation
    
       
       Running partition tests on : Schema
    
          Starting test: CheckSDRefDom
    
             ......................... Schema passed test CheckSDRefDom
    
          Starting test: CrossRefValidation
    
             ......................... Schema passed test CrossRefValidation
    
       
       Running partition tests on : Configuration
    
          Starting test: CheckSDRefDom
    
             ......................... Configuration passed test CheckSDRefDom
    
          Starting test: CrossRefValidation
    
             ......................... Configuration passed test CrossRefValidation
    
       
       Running partition tests on : rt
    
          Starting test: CheckSDRefDom
    
             ......................... rt passed test CheckSDRefDom
    
          Starting test: CrossRefValidation
    
             ......................... rt passed test CrossRefValidation
    
       
       Running enterprise tests on : rt.console.local
    
          Starting test: LocatorCheck
    
             ......................... rt.console.local passed test
    
             LocatorCheck
    
          Starting test: Intersite
    
             ......................... rt.console.local passed test Intersite
    
    

    2012年8月31日 5:48
  • 您好!                         

    由于造成计算机死机的原因较多,我们建议您尝试以下步骤进行排错:

    1. 请您尝试做一次Clean Boot

    a. 运行MSCONFIG

    b. 在常规下选择 选择性启动

    c. 然后清除Process System.ini File, Process Win.ini FileLoad Startup Items 的复选框,但是保留使用原始Boot.ini

    d. 在服务下,先点隐藏所有windows 服务,然后选择 disable all.

    e. 然后重新启动。观察问题是否依旧发生。

    关于Clean boot的详细步骤请您查看:

    http://support.microsoft.com/kb/310353/zh-cn

    另外,我还建议您尝试以下操作:

    a. 从光盘启动计算机,启动故障恢复控制台,然后使用 Chkdsk 命令行实用工具,确定硬盘或文件系统未损坏。

    b. 进入微软网站,更新所有系统补丁。


    希望我的回答对您有所帮助,如果您还有什么问题,请您再和我们联系。


    如果您对我们的论坛在线支持服务有任何的意见或建议,请通过邮件告诉我们。
    Description: Description: TechNet 论坛好帮手立刻免费下载  TechNet 论坛好帮手

    2012年8月31日 5:57
    版主
  • ADWS 错误日志1202

    Log Name:      Active Directory Web Services
    Source:        ADWS
    Date:          2012/8/31 11:25:55
    Event ID:      1202
    Task Category: ADWS Instance Events
    Level:         Error
    Keywords:      Classic
    User:          N/A
    Computer:      RTDC01.rt.console.local
    Description:
    This computer is now hosting the specified directory instance, but Active Directory Web Services could not service it. Active Directory Web Services will retry this operation periodically.
     
     Directory instance: NTDS
     Directory instance LDAP port: 389
     Directory instance SSL port: 636
    
    Event Xml:
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
      <System>
        <Provider Name="ADWS" />
        <EventID Qualifiers="49152">1202</EventID>
        <Level>2</Level>
        <Task>3</Task>
        <Keywords>0x80000000000000</Keywords>
        <TimeCreated SystemTime="2012-08-31T03:25:55.000000000Z" />
        <EventRecordID>1183</EventRecordID>
        <Channel>Active Directory Web Services</Channel>
        <Computer>RTDC01.rt.console.local</Computer>
        <Security />
      </System>
      <EventData>
        <Data>NTDS</Data>
        <Data>389</Data>
        <Data>636</Data>
      </EventData>
    </Event>

    DFSR 错误日志1202

    Log Name:      DFS Replication
    Source:        DFSR
    Date:          2012/8/31 11:24:52
    Event ID:      1202
    Task Category: None
    Level:         Error
    Keywords:      Classic
    User:          N/A
    Computer:      RTDC01.rt.console.local
    Description:
    The DFS Replication service failed to contact domain controller  to access configuration information. Replication is stopped. The service will try again during the next configuration polling cycle, which will occur in 60 minutes. This event can be caused by TCP/IP connectivity, firewall, Active Directory Domain Services, or DNS issues. 
     
    Additional Information: 
    Error: 160 (One or more arguments are not correct.)
    Event Xml:
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
      <System>
        <Provider Name="DFSR" />
        <EventID Qualifiers="49152">1202</EventID>
        <Level>2</Level>
        <Task>0</Task>
        <Keywords>0x80000000000000</Keywords>
        <TimeCreated SystemTime="2012-08-31T03:24:52.000000000Z" />
        <EventRecordID>4163</EventRecordID>
        <Channel>DFS Replication</Channel>
        <Computer>RTDC01.rt.console.local</Computer>
        <Security />
      </System>
      <EventData>
        <Data>
        </Data>
        <Data>60</Data>
        <Data>160</Data>
        <Data>One or more arguments are not correct.</Data>
      </EventData>
    </Event>


    DS日志2086

    Log Name:      Directory Service
    Source:        Microsoft-Windows-ActiveDirectory_DomainService
    Date:          2012/8/31 11:22:48
    Event ID:      2886
    Task Category: LDAP Interface
    Level:         Warning
    Keywords:      Classic
    User:          ANONYMOUS LOGON
    Computer:      RTDC02.rt.console.local
    Description:
    The security of this directory server can be significantly enhanced by configuring the server to reject SASL (Negotiate,  Kerberos, NTLM, or Digest) LDAP binds that do not request signing (integrity verification) and LDAP simple binds that  are performed on a cleartext (non-SSL/TLS-encrypted) connection.  Even if no clients are using such binds, configuring the server to reject them will improve the security of this server.
     
    Some clients may currently be relying on unsigned SASL binds or LDAP simple binds over a non-SSL/TLS connection, and will stop working if this configuration change is made.  To assist in identifying these clients, if such binds occur this  directory server will log a summary event once every 24 hours indicating how many such binds  occurred.  You are encouraged to configure those clients to not use such binds.  Once no such events are observed  for an extended period, it is recommended that you configure the server to reject such binds.
     
    For more details and information on how to make this configuration change to the server, please see http://go.microsoft.com/fwlink/?LinkID=87923.
     
    You can enable additional logging to log an event each time a client makes such a bind, including information on which client made the bind.  To do so, please raise the setting for the "LDAP Interface Events" event logging category to level 2 or higher.
    Event Xml:
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
      <System>
        <Provider Name="Microsoft-Windows-ActiveDirectory_DomainService" Guid="{0e8478c5-3605-4e8c-8497-1e730c959516}" EventSourceName="NTDS General" />
        <EventID Qualifiers="32768">2886</EventID>
        <Version>0</Version>
        <Level>3</Level>
        <Task>16</Task>
        <Opcode>0</Opcode>
        <Keywords>0x8080000000000000</Keywords>
        <TimeCreated SystemTime="2012-08-31T03:22:48.723920100Z" />
        <EventRecordID>4100</EventRecordID>
        <Correlation />
        <Execution ProcessID="572" ThreadID="744" />
        <Channel>Directory Service</Channel>
        <Computer>RTDC02.rt.console.local</Computer>
        <Security UserID="S-1-5-7" />
      </System>
      <EventData>
      </EventData>
    </Event>
    • 已编辑 LingPing 2012年8月31日 6:03
    2012年8月31日 6:01
  • DS 日志2092

    Log Name:      Directory Service
     Source:        Microsoft-Windows-ActiveDirectory_DomainService
     Date:          2012/8/31 11:23:49
     Event ID:      2092
     Task Category: Replication
     Level:         Warning
     Keywords:      Classic
     User:          ANONYMOUS LOGON
     Computer:      RTDC02.rt.console.local
     Description:
     
    This server is the owner of the following FSMO role, but does not consider it valid. For the partition which contains the FSMO, this server has not replicated successfully with any of its partners since this server has been restarted. Replication errors are preventing validation of this role. 
     
     Operations which require contacting a FSMO operation master will fail until this condition is corrected.
      
     FSMO Role: DC=rt,DC=console,DC=local 
     
     User Action: 
     
     1. Initial synchronization is the first early replications done by a system as it is starting. A failure to initially synchronize may explain why a FSMO role cannot be validated. This process is explained in KB article 305476.
     2. This server has one or more replication partners, and replication is failing for all of these partners. Use the command repadmin /showrepl to display the replication errors.  Correct the error in question. For example there maybe problems with IP connectivity, DNS name resolution, or security authentication that are preventing successful replication.
     3. In the rare event that all replication partners being down is an expected occurance, perhaps because of maintenance or a disaster recovery, you can force the role to be validated. This can be done by using NTDSUTIL.EXE to seize the role to the same server. This may be done using the steps provided in KB articles 255504 and 324801 on http://support.microsoft.com. 
     
     The following operations may be impacted: 
    Schema: You will no longer be able to modify the schema for this forest. 
    Domain Naming: You will no longer be able to add or remove domains from this forest.
     PDC: You will no longer be able to perform primary domain controller operations, such as Group Policy updates and password resets for non-Active Directory Domain Services accounts.
     RID: You will not be able to allocation new security identifiers for new user accounts, computer accounts or security groups.
     Infrastructure: Cross-domain name references, such as universal group memberships, will not be updated properly if their target object is moved or renamed.
     Event Xml:
     <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
       <System>
         <Provider Name="Microsoft-Windows-ActiveDirectory_DomainService" Guid="{0e8478c5-3605-4e8c-8497-1e730c959516}" EventSourceName="NTDS General" />
         <EventID Qualifiers="32768">2092</EventID>
         <Version>0</Version>
         <Level>3</Level>
         <Task>5</Task>
         <Opcode>0</Opcode>
         <Keywords>0x8080000000000000</Keywords>
         <TimeCreated SystemTime="2012-08-31T03:23:49.049226000Z" />
         <EventRecordID>4102</EventRecordID>
         <Correlation />
         <Execution ProcessID="572" ThreadID="728" />
         <Channel>Directory Service</Channel>
         <Computer>RTDC02.rt.console.local</Computer>
         <Security UserID="S-1-5-7" />
       </System>
       <EventData>
         <Data>DC=rt,DC=console,DC=local</Data>
       </EventData>
     </Event>
    

    2012年8月31日 6:02
  • 2087

    DS错误日志2087
     
    Log Name:      Directory Service
     Source:        Microsoft-Windows-ActiveDirectory_DomainService
     Date:          2012/8/31 11:22:40
     Event ID:      2087
     Task Category: DS RPC Client
     Level:         Error
     Keywords:      Classic
     User:          ANONYMOUS LOGON
     Computer:      RTDC02.rt.console.local
     Description:
     Active Directory Domain Services could not resolve the following DNS host name of the source domain controller to an IP address. This error prevents additions, deletions and changes in Active Directory Domain Services from replicating between one or more domain controllers in the forest. Security groups, group policy, users and computers and their passwords will be inconsistent between domain controllers until this error is resolved, potentially affecting logon authentication and access to network resources.
      
     Source domain controller: 
     RTDC01 
    Failing DNS host name: 
     e44d9bb9-6467-4c17-8d66-4d8705084a7d._msdcs.rt.console.local 
     
     NOTE: By default, only up to 10 DNS failures are shown for any given 12 hour period, even if more than 10 failures occur.  To log all individual failure events, set the following diagnostics registry value to 1:
      
     Registry Path: 
    HKLM\System\CurrentControlSet\Services\NTDS\Diagnostics\22 DS RPC Client 
     
     User Action: 
     
      1) If the source domain controller is no longer functioning or its operating system has been reinstalled with a different computer name or NTDSDSA object GUID, remove the source domain controller's metadata with ntdsutil.exe, using the steps outlined in MSKB article 216498. 
     
      2) Confirm that the source domain controller is running Active Directory Domain Services and is accessible on the network by typing "net view \\<source DC name>" or "ping <source DC name>".
      
      3) Verify that the source domain controller is using a valid DNS server for DNS services, and that the source domain controller's host record and CNAME record are correctly registered, using the DNS Enhanced version of DCDIAG.EXE available on http://www.microsoft.com/dns 
     
       dcdiag /test:dns 
     
      4) Verify that this destination domain controller is using a valid DNS server for DNS services, by running the DNS Enhanced version of DCDIAG.EXE command on the console of the destination domain controller, as follows:
      
       dcdiag /test:dns 
     
      5) For further analysis of DNS error failures see KB 824449: 
       http://support.microsoft.com/?kbid=824449 
     
     Additional Data 
    Error value: 
     11004 The requested name is valid, but no data of the requested type was found.
     
    Event Xml:
     <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
       <System>
         <Provider Name="Microsoft-Windows-ActiveDirectory_DomainService" Guid="{0e8478c5-3605-4e8c-8497-1e730c959516}" EventSourceName="NTDS General" />
         <EventID Qualifiers="49152">2087</EventID>
         <Version>0</Version>
         <Level>2</Level>
         <Task>22</Task>
         <Opcode>0</Opcode>
         <Keywords>0x8080000000000000</Keywords>
         <TimeCreated SystemTime="2012-08-31T03:22:40.705506000Z" />
         <EventRecordID>4099</EventRecordID>
         <Correlation />
         <Execution ProcessID="572" ThreadID="752" />
         <Channel>Directory Service</Channel>
         <Computer>RTDC02.rt.console.local</Computer>
         <Security UserID="S-1-5-7" />
       </System>
       <EventData>
         <Data>RTDC01</Data>
         <Data>e44d9bb9-6467-4c17-8d66-4d8705084a7d._msdcs.rt.console.local</Data>
         <Data>11004</Data>
         <Data>The requested name is valid, but no data of the requested type was found.</Data>
         <Data>System\CurrentControlSet\Services\NTDS\Diagnostics</Data>
         <Data>22 DS RPC Client</Data>
       </EventData>
    

    2012年8月31日 6:04
  • Hi Tom

    谢谢你的回复,但从日志看,不是单纯系统本身的原因,而且两台服务器同时死机,我认为较大可能性是AD服务出现故障;

    而且这两台服务器均是单纯的DC服务器,没有安装其它服务或软件。

    2012年8月31日 6:10
  • Hi Tom,

    针对主题问题请问有无其它建议?谢谢!

    2012年9月3日 0:27
  • 您好!

    由于您的服务器是偶尔会死机的现象,这可能与您当时运行的软件有关系。由于造成服务器死机错误的可能有很多,要解决该问题会花费比较多的时间。

    Windows无法正常关机的原因:

    1.点击开始→运行并输入“EVENTVWR.MSC”,在系统日志里查找打有×的项目,这可能是导致系统死机的原因。
    2.如果系统无法启动,可以在启动的时候按F8进入安全模式或者使用上一次正确的配置尝试进入系统,查看原因。
    3.您有没有新安装的设备和软件,可以怀疑是否新设备或软件导致系统故障。可以拆卸新设备或者卸载新安装的软件来确定故障原因。

    4.硬件故障造成系统死机,您可以先对主板、内存、硬盘进行排错。

    建议您在下载更新程序,安装系统补丁。

    希望我的回答对您有所帮助。


    如果您对我们的论坛在线支持服务有任何的意见或建议,请通过邮件告诉我们。
    Description: Description: TechNet 论坛好帮手立刻免费下载  TechNet 论坛好帮手

    2012年9月3日 9:33
    版主