none
WHS random crashes RRS feed

  • Question

  • I've recently built a WHS (PP3) using a Gigabyte GA-MA74S2H mobo, which uses the AMD 2100 integrated chipset. I have 1 x 2.5" drive for the O/S, and 4 x Samsung 1.5TB EcoGreen drives for storage (the mobo has 6 x SATA connectors). Memory is 2 x 1GB unbranded sticks.

    After having too many issues with the mobo running in legacy IDE mode, I switched to AHCI mode, and used the latest AMD AHCI driver I could find (which is, apparently, WS03 WHQL approved). I finally got the install completed, and performance was great (80MB/s read, 60MB/s write). I've been running in this mode ever since, but I'm getting random crashes which are very odd. The system can run for days with no problem, then crash + reboot when idle. Sometimes it will just reboot a couple of times close together.

    The minidumps I've looked at seem to indicate AHCIX86.SYS is the culprit, but I'm not so sure. The bugcheck code is;

    BugCheck 100000D1, {1b8, d0000009, 1, f72876c4}

    At the moment, I don't have a WHS I'm totally happy with because of these problems. Does anyone have any idea of what the problem might be?


    Friday, February 12, 2010 1:57 PM

All replies

  • Random reboots/BSODs/Crashes can sometimes be atrributed to memory errors.  Do you have an option to set the memory speeds on your motherboard?  It might be a good idea to run:
    http://www.memtest86.com/


    --
    Friday, February 12, 2010 2:09 PM
  • Thanks. The memory is standard DDR2-800 (not overclocked, and all BIOS memory settings set to 'auto'). I'm aware memory problems can give random crashes, so I'll see if I can run the memory test for a few loops to see if it's the culprit.


    Friday, February 12, 2010 2:52 PM
  • A STOP 0xD1 error usually indicates a memory issue. You should run a memory test tool like the one Al has suggested for an extended period (hours to days).

    It's also possible that you have a HD issue. If the memory test doesn't indicate anything, you should run chkdsk on all the drives in your server.

    I'm not on the WHS team, I just post a lot. :)
    Friday, February 12, 2010 3:34 PM
    Moderator
  • Thanks for the help so far. Taking the advice onboard, I downclocked to memory from DDR800 to DDR667. No obvious loss of performance, and initially at least everything seemed fine, so I thought it was memory related. Ran the server 24x7 for a week and it was perfect. Then, on the 8th day, crash. Came down in the morning, and it was sitting at the AHCI BIOS init screen. Oddly, three of the 4 storage drives had 'SMART error' showing. I had seen this before, but for 3 drives to show an error at the same time, it sounds like a false positive! Anyway, as the AMD AHCI driver doesn't support SMART passthrough, I disabled SMART in the BIOS and in the drives themselves using the Samsung ESTOOL utility, but the SMART errors still appear, so I think the chkdsk might be a good thing to try.

    I am running the latest AHCI driver though (WS03 WHQL approved). It's just so strange how the system can run for a week, then die. I really don't believe there's anything wrong with the drives, but I'll post back with the chkdsk results.
    Saturday, February 20, 2010 11:35 AM
  • ...
    Oddly, three of the 4 storage drives had 'SMART error' showing. I had seen this before, but for 3 drives to show an error at the same time, it sounds like a false positive!
    ...
    It could also be a HD controller issue.

    I'm not on the WHS team, I just post a lot. :)
    Sunday, February 21, 2010 8:02 PM
    Moderator