locked
Failing Hard Disk in WHS, how to recover RRS feed

  • Question

  • Please help.
    I am stuck in the following somewhat absurd situation, and I really can not believe there is no way out.

    - added a new Samsung 1.5 TB hard disk to the WHS (Acer H340) as a data disk
    - removed a previous smaller disk w/o problems, WHS was working fine for a few days (total of 3 disks: 1+1+1.5 TB)
    - then the new disk began to fail, ntfs errors in the log, in the console I get "unexpected" errors in the storage manager, storage pool size
    - tried 'repair', fails again with "unexpected" errors, tried CHKDSK, no significant errors, no luck

    Since all important directories are duplicated, I tried this:
    - Add a new Samsung 1.5 TB disk -> no problems
    - Try to remove the faulty disk: this will not let me because there is not enough space (can not be true as I just added the new, same sized disk, and there is still space left on the other 2)

    Now I am stuck: documentation warns me against just pulling the faulty disk, because it could permanently delete the duplicated data, but, as I understand it I cannot remove the faulty disk precisely for the reason that is faulty and therefore WHS can not determine the sizes?

    This really makes me wonder what the point is of redundant storage (or may be even the point of having WHS). The other baffling thing it did was 'repairing' the backup database by throwing away all backups!

    Or am I missing something? Samsung HD154UI not really supported or something? But still, for whatever reason the disk is failing I still expect to get a straightforward path to rescue the duplicate data?

    If someone could give me some pointers?

    Many thanks,

    Paul
    Sunday, October 25, 2009 9:52 AM

Answers

  • What errors is chkdsk reporting when you run it on all the drives in your server ?

    You can remove the disk as follows:
    • Shut your server down.
    • Physically remove the disk. (Don't throw it away, you'll want it in a moment. :))
    • Boot your server.
    • In the console, check to make sure you physically removed the correct disk (there's a disk listed as "missing" now; make sure it's the one that was "failing" previously.
    • Assuming you got the right disk, remove it. This will be fast, and may warn you about the potential loss of files and/or backups. Accept this for now if such warnings appear.
    • Now connect the failing disk to some other computer and follow the steps you will find here to recover as many files from your shares as possible.
    • If your server warned you would lose backups, consider your backup database irretrievably damaged. (It's possible to reassemble the database assuming none of the files are damaged, but not usually worth the effort.) Run the backup repair option in the console.
    If any files in unduplicated shares were stored on the failing hard drive, it's possible that they are irretrievably lost. If they are essential, you can consider a data recovery service, but I will warn you that it will be extremely expensive.
    I'm not on the WHS team, I just post a lot. :)
    • Marked as answer by Paul Zuh Tuesday, October 27, 2009 11:07 PM
    Sunday, October 25, 2009 2:02 PM
    Moderator

All replies

  • What errors is chkdsk reporting when you run it on all the drives in your server ?

    You can remove the disk as follows:
    • Shut your server down.
    • Physically remove the disk. (Don't throw it away, you'll want it in a moment. :))
    • Boot your server.
    • In the console, check to make sure you physically removed the correct disk (there's a disk listed as "missing" now; make sure it's the one that was "failing" previously.
    • Assuming you got the right disk, remove it. This will be fast, and may warn you about the potential loss of files and/or backups. Accept this for now if such warnings appear.
    • Now connect the failing disk to some other computer and follow the steps you will find here to recover as many files from your shares as possible.
    • If your server warned you would lose backups, consider your backup database irretrievably damaged. (It's possible to reassemble the database assuming none of the files are damaged, but not usually worth the effort.) Run the backup repair option in the console.
    If any files in unduplicated shares were stored on the failing hard drive, it's possible that they are irretrievably lost. If they are essential, you can consider a data recovery service, but I will warn you that it will be extremely expensive.
    I'm not on the WHS team, I just post a lot. :)
    • Marked as answer by Paul Zuh Tuesday, October 27, 2009 11:07 PM
    Sunday, October 25, 2009 2:02 PM
    Moderator
  • Hello Ken,

    Many thanks for the quick answer. It sounds hopeful, as most of the essential data was duplicated.

    For chckdsk the only tangible error so far was "5 unindexed files processed", and a freeze somewhere halfway into stage 4. I am rerunning it with this script

    net stop pdl
    net stop whsbackup
    chkdsk D: /x /r 
    chkdsk C: /x /r
    for /d %%1 in (C:\fs\*) do start chkdsk /x /r %%1


    found elsewhere on this forum, still hoping to make some progess there, it was a brand new disk after all.

    If this does not work out I will pull the disk as per your description. Will let you know how it works out.

    Paul
    Sunday, October 25, 2009 7:53 PM
  • chkdsk freezing or locking your server partway through scanning a drive is a very bad sign. If the drive manufacturer has a diagnostic tool you can try running that instead, but in all probability the drive is dying. The other main possibility is a failing stick of RAM, but given that the disk is new, I'm more inclined to suspect the disk than RAM.
    I'm not on the WHS team, I just post a lot. :)
    Monday, October 26, 2009 1:31 AM
    Moderator
  • Pulled the disk and seem to have all duplicated data available (but how will I ever verify that). Non-duplicated data seems all to have gone to the big void, attaching the disk to a regular pc shows it as a 1397 RAW partition and a 9MB unallocated one (is that a WHS thing?). Nothing there to salvage. Once I have entered into the stage of acceptance I will reformat it and determine who is to blame: Acer, MS, or Samsung.

    Meanwhile I have come to the conclusion that WHS is a nice multimedia server but lacks the tooling to be a serious data storage device. Failure of one disk brought it into a state of almost complete uselessness (even had a lot of trouble getting back in) repairs did not work and gave no useful situation (despite ruining the best part of a weekend I still have no concrete idea on what went wrong with the disk); a tool like chkdsk is really not something you would expect to have to use in the 21st century, communicating error situations by crashing is not very effective. Neither are appr. 200 stacked "unexpected error" messages.

    I still need a decent solution for secure storage, and in this respect WHS now looks to me like a great car; it is just the brakes are not there, and the airbags are not there yet.

    That of my chest I really appreciate the generous and kind way in which I got answers in this forum. Thanks Ken.
    Monday, October 26, 2009 12:13 PM
  • Pulled the disk and seem to have all duplicated data available (but how will I ever verify that). Non-duplicated data seems all to have gone to the big void, attaching the disk to a regular pc shows it as a 1397 RAW partition and a 9MB unallocated one (is that a WHS thing?). Nothing there to salvage. Once I have entered into the stage of acceptance I will reformat it and determine who is to blame: Acer, MS, or Samsung.

    Meanwhile I have come to the conclusion that WHS is a nice multimedia server but lacks the tooling to be a serious data storage device. Failure of one disk brought it into a state of almost complete uselessness (even had a lot of trouble getting back in) repairs did not work and gave no useful situation (despite ruining the best part of a weekend I still have no concrete idea on what went wrong with the disk); a tool like chkdsk is really not something you would expect to have to use in the 21st century, communicating error situations by crashing is not very effective. Neither are appr. 200 stacked "unexpected error" messages.

    I still need a decent solution for secure storage, and in this respect WHS now looks to me like a great car; it is just the brakes are not there, and the airbags are not there yet.

    That of my chest I really appreciate the generous and kind way in which I got answers in this forum. Thanks Ken.
    I'm sorry, but I have to disagree with you on a few things;

    IMHO WHS has a great tool to prevent data loss by single disk failure; "duplication". You can enable this for all shares from the WHS console, and if you also want your backups protected this way you can use the WHS BDBB Add-in to enable duplication for the backups also.  In additon WHS has an option to backup data in shares to another disk which is not in the storage pool, WHS BDBB Add-in does the same for the backup database. Breaks and airbags are there, you just need to use them in the proper way. As with a modern car WHS is advanced technology and if it breaks it may require an expert to fix it.

    Having said that, apparently the disk failed somehow. The data is probably still there and depending on the cause of the problem probably recoverable in more then one way. Most likely MBR has been damaged. I would advise you to mount the disk in another system, then use PartitionFindandMount to try and recover the data. This will not fix the problem on the disk but it will allow you to recover the data without affecting the disk. Once the data is secure you can either reformat or try other tools  (fixmbr) to fix the disk. If it's not MBR problem you can try file recovery tools such as PC Inspector (free) or Stellar Phoenix Data recovery (not free)
    Monday, October 26, 2009 2:38 PM
    Moderator
  • brubber, understand what you are saying. Irony was that I was replacing a 750 GB disk with a 1.5 TB one, to be able to have everything duplicated, and still have one bay (out of 4) left to do non-storage pool backups. That the new disk now was the cause of the loss of the non-duplicated files rather than its savior is the kind of bad luck that seems to happen more often than statistically likely. I suppose I should be glad that not simultaneously one of my backup clients died.
     
    My point was not that a lot of data had been lost, but that WHS did not help at all. As I said it took me a lot of misery even to get back into the box, and then there was nothing, really nothing that could assist in acknowledging or assessing the damage, other than literally hundreds of 'unexpected error' messages. It is great fun having to click away fifty identical dialog boxes to click a repair button that brings ten more. The built in 'repair' functions were completely pointless: I can accept that not everything is easily repairable, but I cannot really understand why the most telling feedback is a crash of the diagnosing and repair tool, and, again, more 'unexpected error' messages. That level of functionality in a system designed to deal with mass storage, and with the system disk still completely intact, is extremely disappointing in my opinion.

    All the same I will gladly try your suggestions for the dead disk and thank you for them. 


    Monday, October 26, 2009 10:17 PM