none
WHS Stalling, strange drive access pattern RRS feed

  • Question

  • Hi!

    I've been running WHS on a computer I built myself for about 3 or 4 months now. I recently upgraded the 120 day evaluation to the full version using the guides on this website. While I was getting the server ready for migration to the full version, I noticed that when I was logged into the admin acount, either locally or by remote desktop connection, the OS would seem to freeze for 5 to 10 minutes, then suddenly lurch into action again. This was not the case when I had originally installed WHS trial. I figured that the reinstall would clear it, and just went ahead with it.

    Unfortunately, after reinstall, the error was still there. The system is sometimes fine, then sometimes frozen in some kind of cryo stasis, but not crashing. This affects the logged in user by stalling the OS while I'm trying to administrate the computer. It also affects the WHS connector, by saying the server cannot be contacted (since it's currently stalled out). It also affects when I either write to, or read from a shared folder on the server, causing programs to time out.

    I tried to look at the task manager to see if my CPU went to 100%, but this is not occuring, the HDD is often stuck at 0% during these spells. When the system is running smooth, there are no errors for drives or folders in WHS connector. I have run WD diagnostics on the two drives in the computer (2x 1.5TB WD Cavair Green), but the tests came up clean. I ran these tests in the windows version of the diagnostic, and do not trust them as much as a dos based bootdisk diag, which I will be doing this week when I have time to extract the floppy maker onto a USB drive.

    The only thing I have found is in performance monitor, the HDD usage spikes from 100% to 0% each time the monitor ticks forward, creating a pattern that looks similar to a gamma ray sine wave. It's just too uniform a pattern to be unrelated. (I'd post a screenshot, but I'm not sure how).

    Has anyone run into something like this? I'm completely baffled, but I'm a desktop tech, not so much a server tech.

    Thanks,

    James
    Sunday, January 31, 2010 4:00 AM

All replies

  • Have you looked up your SATA Controller against your hard disks.  They could be listed as having a known problem.  Is there a more recent driver for your SATA Controller. 

    Another thing is do you have any add-ins that are reading temperature information?  If you drives are not running in DMA mode I think these can cause a problem (check under device manager)



    --
    Sunday, January 31, 2010 3:02 PM
  • this happened to me... it turned out one of my drives was in the process of dying.  I had a controller on one of the drives was dying.  Since it was sporatic, it came and went.  I have used the windows util and the dos one is more reliable.

    good luck.
    • Proposed as answer by DrX69 Monday, February 8, 2010 4:48 PM
    • Marked as answer by Jonas Svensson -FST- Monday, February 8, 2010 9:55 PM
    • Unmarked as answer by frosty2k Monday, February 15, 2010 4:30 PM
    • Unproposed as answer by kariya21Moderator Tuesday, February 16, 2010 12:59 AM
    Sunday, January 31, 2010 5:42 PM
  • This was NOT the answer to my question. I backed up all of my data on the server, and then ran the WD drive diagnostics (both the safe one, and the ones that write on the drive). These diagnostics all came back clean. I then re-formatted the disks, added 2 smaller drives that I knew were clean (320, 200) and reinstalled WHS on a clean drive. It seemed fine for a little while, but the problem persists.

    I installed the latest SATA drivers from the MB manufacturer, and it still does not fix it. The problem seems to get worse, the more data is stored on the server. Right now I am at 1.8TB free out of 3.2TB total.

    Any else have any thoughts? As i said this answer is false and did not solve this issue.

    James
    Monday, February 15, 2010 4:30 PM
  • you didnt get the drives that are 4k sectors did you?  if you did, that would explain this issue.
    Monday, February 15, 2010 4:52 PM
  • Actually, I have no addins running at all. It is a completly fresh install of WHS from OEM disks.

    I will try to locate a compatability list for my SATA controller. The board is an Asus P5Q which uses the Intel P45 chipset. I think the SATA controller is part of the ICH10R southbridge. There is also a Marvell 88SE6111 RAID controller which I am not using, but updated the drivers anyway.
    Monday, February 15, 2010 7:26 PM
  • They are WD15EADS-00P8B0 drives with 01.00A01 firmware... what are the 4k sector drives?


    Monday, February 15, 2010 7:27 PM
  • The EARS series from WD has 4k sectors. If you want to know about this feature, please read:
    http://www.anandtech.com/storage/showdoc.aspx?i=3691
    The EADS series, which you have, are "normal" drives, so at least now you know your problem is not related to the 4k sector disks.
    About other suggestions for your problem, sorry, I cannot think to anything. Maybe put the drives on the Marvell controller, just to test?

    Chris
    Tuesday, February 16, 2010 2:58 PM
  • I think I may have determined the issue.

    Apparently the 00P8B0 revision of the WD15EADS are, shall we say, developmentally delayed. It sounds like there is an issue with the controller on many of these drives that makes them get 'confused' and stall out for 10-15 mins at a time, especially when transferring large files (I have a lot of videos). This does not trigger a failure in WD Data Lifeguard for some reason. Four of the 20 or so posts on the first page of the WD forums are about this specific failure on these exact drives :(

    I have opened up Performance Monitor, and now have separate graphs for each drive, in an attempt to determine which one, or both of, the drives are experiencing the issue. When I transfer a big video_ts file to the drive, it always seems to pick drive 3, and I have confirmed that this is a bum drive. I will be sending and RMA for this one, but I need to know if the other one is bad too.

    Does anyone know if I can override the copy to drive 3 and force a copy to drive 1 (the other suspect drive)? Either that, or can I run a chkdsk on a specific physical drive even though they are all in the WHS matrix partition thing? (my system is on drive 0, and drive 2 is one of the other good drives I installed) I tried to do a chkdsk before, but it looks like it scans the D: drive which is the entire drive matrix.

    I really don't want to pull the drive and run it on my desktop if I don't have to.

    Oh and thanks to everyone for leading me down the right pathways.

    James
    Tuesday, February 16, 2010 11:51 PM
  • I think I may have determined the issue.

    Apparently the 00P8B0 revision of the WD15EADS are, shall we say, developmentally delayed. It sounds like there is an issue with the controller on many of these drives that makes them get 'confused' and stall out for 10-15 mins at a time, especially when transferring large files (I have a lot of videos). This does not trigger a failure in WD Data Lifeguard for some reason. Four of the 20 or so posts on the first page of the WD forums are about this specific failure on these exact drives :(

    I have opened up Performance Monitor, and now have separate graphs for each drive, in an attempt to determine which one, or both of, the drives are experiencing the issue. When I transfer a big video_ts file to the drive, it always seems to pick drive 3, and I have confirmed that this is a bum drive. I will be sending and RMA for this one, but I need to know if the other one is bad too.

    Does anyone know if I can override the copy to drive 3 and force a copy to drive 1 (the other suspect drive)?

    No.

    Either that, or can I run a chkdsk on a specific physical drive even though they are all in the WHS matrix partition thing?

    Yes.  Try chkdsk c:\fs\x /r (where x = the volume mount point of the drive in question).

    (my system is on drive 0, and drive 2 is one of the other good drives I installed) I tried to do a chkdsk before, but it looks like it scans the D: drive which is the entire drive matrix.

    No, it's not.  Running chkdsk /r on D only checks D, not the secondary drives.

    I really don't want to pull the drive and run it on my desktop if I don't have to.

    Oh and thanks to everyone for leading me down the right pathways.

    James

    Wednesday, February 17, 2010 3:21 AM
    Moderator