none
How to restore after a failed data drive. RRS feed

  • Question

  • Sorry it's a long post but I figure it's better to lay it all out at once and try to answer any questions in advance.

     

    I've searched, honestly I have. But I can't seem to find the answer that spells it out clearly enough to allow me to proceed. I found many posts on failing OS drives, but that's not what I'm facing. I simply have a failed data storage drive and don't want to do anything foolish and lose data. I do think I have an ace in the hole so to speak but I don't want to use it if I don't have to. I just want to find out the best way to recover based on where I am. Currently the server is running a checkdisk and unavailable so please bear with me if I get a few names wrong

    I have a home built server with the following

     160 Gb ide OS drive (qty 1)

    500 GB Sata drives (qty 4) 1 failed

    Duplication on for everything

    9 computers backed up to server.

     

    I have noticed recently that my server would just seem to lock up and not let me log in. I used remore access (RM) and the event viewer reported a failing drive.  So I pulled out an external 500GB USB drive and hooked it up to the server. (I don't remember adding it to the storage pool although I probably must have)

    After hooking up the external drive I used the WHS BDBB addon and I believe I duplicated the backup database. If there was just an option to copy the database instead of duplicating it I probably used that option (I do't remember and have no access to the server ) At any rate when I hook this external USB drive up to my desktop it has a single file named BDBB inside that is a dated file (2010_3_13 and inside that dated file is a bunch of dat and config.dat files.  This makes me believe that I have a valid copy of all the recent backups up until March 13th. I stored this external drive and did not hook it up to the server since I made the backup.

    Today (March 21) I hooked up a second USB drive and copied off ALL my shared folders. On my desktop PC these files are accessible and good.  So I think that all my data is safe except for the last weeks worth of backups.

    After backing up the shared folders I went looking for the failing drive.  I figured it was as good a time as any to also build a wireframe diagram in Disk Manager. So I started a routine of disconnecting one 500 gb drive at a time and booting up the server.  About halfway through this process the failing drive just gave up. I mean the server stopped booting or at least appeared to stop booting when using RM. After multiple hard reboots i gave up and hooked up a monitor, keybd and mouse and saw that it was stuck doing a chkdsk.

    L o n g  story short if I unplug the failing drive the server will boot fine. With the failing drive connected it stops and tries to run chkdsk. Currently the server is running checkdisk and reporting multiple errors.

     

    So this is where I am and where I'm unsure about the safest way to proceed.

    Should I let it finish check disk and then try to use the console to remove the drive?

    Assuming I succeed in using the console to remove the drive do I just shut the server down, swap the drive out, and add the new drive to the storage pool? In this scenario do I stand a decent chance of saving all my files?

    Or should I just shut it down and replace the drive? Then boot up and use the console to remove the old drive. I'm assuming based on Ken's FAQ that this scenario will preserve my shared data but I have a chance to lose some if not all of my backups.

    Assuming that I apply the "just stop the check disk and replace" method, how easy is it to delete and then restore my older backups using BDBB?  Has anyone done this? I'm not as worried about my shared folders as I know I can use my USB drive and feed all that back through one of my desktops.

    Wow!  now to add to the dilemma, as I'm finishing typing this I just got a popup that the server is back online. now I'll want to go and play with it but i also want to wait for some answers from the community before I do so.....oooh sometines I hate computers. :)

    Thank you for any and all help. I love this forum albeit until now as a lurker.  

     

     

    Monday, March 22, 2010 5:19 AM

Answers

  • Since your shares are backed up, disconnect the disk and reboot your server, then remove the missing disk using the console. You may be warned that you will lose files or backups; you should accept this and proceed with the disk removal. Once the disk has been removed, if you were warned that you would lose backups, you should use the repair button in the console to repair the backup database. If you were warned that you would lose files, you'll either want to sort out what exactly was lost and replace just those files, or you'll want to just copy your share backup over your current shares.

     

    You can restore the saved copy of the backup database if you like, using the BDBB add-in, but is there anything essential in that saved copy?


    I'm not on the WHS team, I just post a lot. :)
    • Proposed as answer by kariya21Moderator Tuesday, March 23, 2010 1:55 AM
    • Marked as answer by MyAvatar Wednesday, March 24, 2010 4:31 AM
    Monday, March 22, 2010 11:25 AM
    Moderator

All replies

  • Since your shares are backed up, disconnect the disk and reboot your server, then remove the missing disk using the console. You may be warned that you will lose files or backups; you should accept this and proceed with the disk removal. Once the disk has been removed, if you were warned that you would lose backups, you should use the repair button in the console to repair the backup database. If you were warned that you would lose files, you'll either want to sort out what exactly was lost and replace just those files, or you'll want to just copy your share backup over your current shares.

     

    You can restore the saved copy of the backup database if you like, using the BDBB add-in, but is there anything essential in that saved copy?


    I'm not on the WHS team, I just post a lot. :)
    • Proposed as answer by kariya21Moderator Tuesday, March 23, 2010 1:55 AM
    • Marked as answer by MyAvatar Wednesday, March 24, 2010 4:31 AM
    Monday, March 22, 2010 11:25 AM
    Moderator
  • Thanks Ken,

    Unfortunately I did what people do and fiddled with it. After check disk finished it took another 5 minutes to finish "applying system settings". After that and another 5 minutes waiting for desktop icons.

    After this I found that the console would not launch because of a problem with Disk Manager.  So instead of hitting the skip button I went into add remove programs and removed it. One reboot and many clicks on the "Skip this process" later and I now have almost no tabs across the top of my  console.  So I started a server restore. Once I get the console back up and running I'll follow your steps and post if anything else pops up. 

    As far as the backup databases I've been debating that same question.  But I'll probably restore as I like to save the install image of any new computer I build in case I have to do a complete re-install. Maybe it's time to re-evaluate that strategy though.

     

    Monday, March 22, 2010 4:49 PM
  • I reinstalled the server and now I'm not quite sure where I am as I'm getting windows errors when the server boots up.

    Here's what I've done.

    1.  I  removed the failed 500gb drive
    2.  Added a new 500gb drive.
    3. Did a server reinstall
    4. Had many errors when patching
    5. Did a second server reinstall
    6. Still getting errors but kept applying patches until there were none left.

    What I did not yet do

    1. Did not remove failed drive from the console
    2. Did not install any addins.
    3. Did not run any backups. I disabled them before I caused all this but the the service isn't running anyway so I can't run them if I wanted to.
    4. Did not attempt to repair backups.

    In event viewer the errors seem to be all in the Home Server and the Application logs.

    1. Faulting application demigrator.exe, version 6.0.2423.0, faulting module deutil.dll, version 6.0.2423.0, fault address 0x0001a2cc.For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
    2. Faulting application whsarch.exe, version 6.0.2423.0, faulting module deutil.dll, version 6.0.2423.0, fault address 0x0001a2cc. For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
    3. Unexpected error 0x2 from GetVolumeNameForMountPoint: The system cannot find the file specified. For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    This is just a sample but they all seem to be related to either demigrator, whsarch, or GetVolumeNameForMountPoint. Is this because I still need to remove the failed drie and rebuild the backups?

     

    Should I mark this thread closed and open a seperate thread for this?

    and finally,

    Am I supposed to have lost all my users/ because they're all gone.

    Thanks

    Tuesday, March 23, 2010 7:11 PM
  • I'm not sure why you decided on the procedure you followed, but, umm, it was the wrong thing to do. Off the top of my head, your best bet right now is probably to remove all the disks from your server except the newest one. Check the newest one for data per this FAQ, then do a new server installation (assuming it's empty). Then take all the other disks, and follow the instructions in that FAQ to copy your files back into your shares.

     

    To answer your question about losing users: yes, that's normal. See this FAQ which gives a brief overview of what you will lose in a reinstallation, along with some information about the reinstallation process itself, and this one for more information about what you might lose when a disk dies.


    I'm not on the WHS team, I just post a lot. :)
    Tuesday, March 23, 2010 9:01 PM
    Moderator
  • Thanks Ken, I'll try the FAQ when I get home tonight, but where did I go wrong? Did I misunderstand the process? I thought I was supposed to replace the failed drive and then do a server reinstall. It's funny how we think we understand a process and then manage to mess it up anyway. :( 

    Oh well I guess this is how we learn not to mess things up. :)

     

     

    Tuesday, March 23, 2010 9:45 PM
  • Well, you didn't need to reinstall. All you needed to do was disconnect the failed drive, then use the console to remove the (now missing) drive from the storage pool, assuming the one you disconnected was really the one that had failed.
    I'm not on the WHS team, I just post a lot. :)
    Wednesday, March 24, 2010 1:21 AM
    Moderator
  • Ah, I see where the disconnect happened. When i was trying to identify the failed drive by unplugging and booting the server my OS drive corrupted hence the need to reinstall in the middle of a data recovery.  My box is old and running 2 Maxtor 150 sata cards and I couldn't find any other way to identify which drive was which.

    At this point I think I've hosed it up enough that rather than try to recover the data I'm just going to start it over. Fortunately I was at least smart enough to copy off all my shared folders before I started this whole saga so all is not lost. I'm going to play with it a bit more and see what I can do but I already expect a fresh install coming.

     

    Ken, thanks for your patience in trying to help me get through this. I think this one can be marked as answered and I'll open a new thread in the appropriate forum if I need help  with the new install.

    Thank you 

    Wednesday, March 24, 2010 4:31 AM