locked
Suspect A Bad Drive: Troubleshooting Strategy? RRS feed

  • Question

  • Last couple days my WHS system has been acting up.

    Froze three times: once last night, three times after work today.

    Freezes part way through a ChkDsk D:   /F /R

    Freezes part way through a plain old ChkDsk D:

    In prior runs, ChkDsk has indicated that there are a few bad sectors somewhere.

    I have the system on an IDE drive and the rest is on 4 1-TB SATA drives.

    Still having a free SATA receptacle on the mobo, I'm thinking that I should get a 1.5-TB SATA drive, plug it in, add it to the WHS pool, and then for each 1-TB drive:

    • Remove it from the pool
    • Do a low-level format
    • Put it back in the pool

    Cross my fingers and wait....

    Does this sound reasonable?
    Tuesday, July 14, 2009 12:59 AM

All replies

  • Last couple days my WHS system has been acting up.

    Froze three times: once last night, three times after work today.

    Freezes part way through a ChkDsk D:   /F /R

    Freezes part way through a plain old ChkDsk D:

    In prior runs, ChkDsk has indicated that there are a few bad sectors somewhere.

    I have the system on an IDE drive and the rest is on 4 1-TB SATA drives.

    Still having a free SATA receptacle on the mobo, I'm thinking that I should get a 1.5-TB SATA drive, plug it in, add it to the WHS pool, and then for each 1-TB drive:

    • Remove it from the pool
    • Do a low-level format
    • Put it back in the pool

    Cross my fingers and wait....

    Does this sound reasonable?

    Honestly, no.  Clearly your primary drive is failing (it can't even finish checking the D partition).  Why wouldn't you just replace it?
    Tuesday, July 14, 2009 1:31 AM
    Moderator
  • I had thought that D included the pool.

    If not and D is bad, then that means replacing the system drive - on which D must be a partition.

    Is this doable without losing the pooled data?  Seems like I've read that it is, but a term of art would help in tracking down the procedure.
    Tuesday, July 14, 2009 11:15 AM
  • Hi,
    as long as the shared folders are duplicated, the only stuff you potentially loose, are parts of the backup database. Since this is not redundant, the only way to recover would be to redo all the backups.
    Would you login locally on your server and check the event viewer to see, if there are more details about crashes (i.e. warnings or errors by source NTFS)?
    Could it also be other reasons causing the crashes (i.e. thermal issues, dust on coolers, a failing mainboard, a too weak power supply)?

    Read also the FAQ How to recover data after server failure to understand, how you can recover your data, if a server reinstall is not offered after swapping the potentially damaged disk.

    Best greetings from Germany
    Olaf
    Tuesday, July 14, 2009 11:46 AM
    Moderator
  • ... a term of art ...
    Server reinstallation. It's all over the forums, but here's a link to a (sketchy) description in the FAQ section.

    As far as failing drives are concerned: Disk drives are cheap, relative to the value of the data on them. If you have a failing drive, you should replace it. Having chkdsk die part way through a run is a solid indication that your drive (or disk controller) is having serious issues.

    I'm not on the WHS team, I just post a lot. :)
    Tuesday, July 14, 2009 3:49 PM
    Moderator
  • Server reinstallation. It's all over the forums...
    Disk drives are cheap..

    Ok, then it's a new system drive and Server Reinstallation.

    But I've also been lusting after more drives for awhile now - my current case being limited to 5.

    So, while I'm at it, maybe I will replace the case with a real server case like this Norco product: http://tinyurl.com/npyrp4 at the same time I do the Server Reinstallation.

    Aside from size/orientation, the case's defining feature seems tb something called a "Backplane" - which I can spell, but not much more - except that it serves up 20 "hot swapable" SATA bays.

    My plan would be:

    1. Bolt over the mobo
    2. Hook up the new system ("Primary"?) drive
    3. Temporarily hang my four 1-TB SATA drives directly on to the mobo's SATA receptacles just like they are now
    4. Do the Reinstallation
    5. Add a 1.5 TB SATA drive to the "backplane"
    6. Tell WHS to add that drive to it's drive pool.
    7. Repeat 5 & 6 until I have 4.5 T-bytes of additional storage space available in the WHS pool.
    8. Tell WHS to remove one of the old 1-TB SATA drives from the pool (at which time I assume it will copy the data from said drive to be removed to available pool space)
    9. Disconnect the old 1-TB drive from the mobo
    10. Connect said drive to the backplane
    11. Tell WHS to add it to the pool
    12. Repeat steps 8-11 until the remaining three 1-TB drives have been cycled out of/back into the pool.

    Should this work?

    My main concern is that I don't have a clue vis-a-vis "backplane" and wouldn't want some gotcha to arise with WHS and it's drive pool.

    OTOH, the Norco box is a "Server" case and WHS is a "Server".... so it's probably a marriage made in heaven.

    But it would be nice to be sure before I lay out the long green.



    Finally/Tangentially:

    It seems like the spirit of WHS is at least partially  "Don't worry about the data, we'll take care of it." 

    If that characterization is correct, do I not to bother myself with doing ChkDsk against the WHS pool drives (assuming I even knew how...).  i.e. Will WHS periodically scan it's pool and issue some sort of "Push"/"Interrruptive" notification if it thinks things are going South pool-wise?
    Tuesday, July 14, 2009 9:42 PM
  • Your plan looks fine. And yes, the Norco server chassis are used by many for WHS. They aren't the quietest boxes, but the backplanes and hard drive trays makes it very useful for lots of drives.

    chkdsk is automatically performed every 24 hours. If a drive is failing, it will be marked so in the console. I expect that WHS will stop using that drive to store data until a repair is done.
    Thursday, July 16, 2009 6:45 PM
  • Your plan looks fine. And yes, the Norco server chassis are used by many for WHS. They aren't the quietest boxes, but the backplanes and hard drive trays makes it very useful for lots of drives.

    chkdsk is automatically performed every 24 hours. If a drive is failing, it will be marked so in the console. I expect that WHS will stop using that drive to store data until a repair is done.

    To clarify Evaders99's response:  you will receive a notification in the Console due to a failing hard drive, but only after that drive fails 4 consecutive chkdsks (which, since it only checks once a day, you won't find out about it until 4 days have passed).
    Friday, July 17, 2009 1:54 AM
    Moderator