none
WHS Recovery with 2 dying (not dead) hard drives RRS feed

  • Question

  • I have a WHS with a bunch of drives in it.  Unfortunately, I bought several of the new Seagate 1.5TB drives and now two of them are failing when loaded heavily for more than a few minutes.  All the data on the server is in duplicated shares, but that doesn't help with a double failure.  The drives generally come up and as long as I don't access them continously for more than a few minutes at a time they are fine.  On the HW debug side I've replaced the cables, the power supply and moved the drives to different controllers, so I'm pretty confident it is a drive problem.

    I would like to remove them from the WHS, but whenever I start the remove operation it pounds them heavily and one of them fails after 10-15minutes.  The server seems to want to copy the data onto the other new drive in preference to my older, more full drives.  My first thought was to somehow tell the server not to move files onto the other failing drive (only onto good drives).  If I could do that, I think I could run remove repeatedly until all the files are moved.  Unfortunatly I can't figure out how you would do that.  I can't seem to find any other potential solutions anywhere on the internet, the closest I've founds was on this site's FAQ - to mount the drives in another box and pull the data out of the DE/shares directory.  In that case, I'm not sure what that would do to the server.  Would it lose the PC backups on it and what happens to the internal consistancy with the loss of 2 drives?  I didn't want to try it without some input. 

    Other than that I'm out of ideas,  so I'm looking for suggestions (or hints on what to look for), even if they are fairly complicated.  Thanks in advance.
    -Cloud
    Monday, June 29, 2009 5:00 PM

Answers

  • I just realized I never came back and reported on my outcome.  Thanks for the help and I hope this will help someone searching for solutions in the future.

    The primary problem with the HDDs appears to have been heat related.  With the drives sitting on the desktop (next to the server) they ran well enough for me to pull backups of both the backup database and the main file storage.

    I did put them back in the case with some Sythe 120cfm (extremely loud) fans and they worked for for much longer without failure.  However at a failure a week the data is more valuable than the risk and the two drives in question are now doorstops.  They will fail sometimes as low as 36C which is pretty horrible. A 18% failure rate makes those seagates not quite as cheap as they look :(.

    I have new (quite, high flow/pressure) fans in my case and all the other drives are still running fine.  Either way I'm now doing backups on a regular basis.  Now all I need as an automated way to do it so I won't get lazy about it.

    Thanks again.
    -Greg
    Tuesday, August 11, 2009 3:42 PM

All replies

  • I have a WHS with a bunch of drives in it.  Unfortunately, I bought several of the new Seagate 1.5TB drives and now two of them are failing when loaded heavily for more than a few minutes.  All the data on the server is in duplicated shares, but that doesn't help with a double failure.  The drives generally come up and as long as I don't access them continously for more than a few minutes at a time they are fine.  On the HW debug side I've replaced the cables, the power supply and moved the drives to different controllers, so I'm pretty confident it is a drive problem.

    I would like to remove them from the WHS, but whenever I start the remove operation it pounds them heavily and one of them fails after 10-15minutes.  The server seems to want to copy the data onto the other new drive in preference to my older, more full drives.  My first thought was to somehow tell the server not to move files onto the other failing drive (only onto good drives).  If I could do that, I think I could run remove repeatedly until all the files are moved.  Unfortunatly I can't figure out how you would do that.  I can't seem to find any other potential solutions anywhere on the internet, the closest I've founds was on this site's FAQ - to mount the drives in another box and pull the data out of the DE/shares directory.  In that case, I'm not sure what that would do to the server.  Would it lose the PC backups on it and what happens to the internal consistancy with the loss of 2 drives?  I didn't want to try it without some input. 

    Other than that I'm out of ideas,  so I'm looking for suggestions (or hints on what to look for), even if they are fairly complicated.  Thanks in advance.
    -Cloud

    Hi,

    If you copy the backup database off the server prior to removing the two drives, you may be able to save your backups providing there are not errors in the backup database because the drives are failling.  The more you access these drives while they are failing, the more risk you take. I just want to make that clear.

    The steps for making a copy of the backup database can be found in the technical brief for backup and restore:

    Saving a Copy of the Backup Database

    The Windows Home Server backup database is not duplicated by Windows Home Server Drive Extender. So if you lose a single hard drive on your home server, you could possibly lose all of the backups of your home computers.

     

    You may want to periodically copy the entire backup database from your home server to an external hard disk that you attach to your home server. The external hard disk should not be added to the Server Storage on your Windows Home Server.

     

    Important

    The cluster data files stored in backup database can grow to 4 GB, so it is important that the external hard disk is formatted as NTFS to support copying these large files. Some file systems, such as FAT-32 allow a maximum file size of 2 GB.

     

    The backup database is stored entirely in the folder D:\folders\{00008086-058D-4C89-AB57-A7F909A47AB4}.

     

    *       To copy the backup database from your Windows Home Server

    (In this example, assume that the external hard drive is given a drive letter of E:)

     

    1.     Run mstsc.exe to start a Remote Desktop Connection session to your home server.

    Caution

    Be careful when using a Remote Desktop Connection to your home server. You can damage Windows Home Server functionality if you use it incorrectly.

     

    2.     Plug in an external hard drive to your home server (do not add it to the Server Storage through the Windows Home Server Console).

    3.     Open a Command Prompt, Click Start, Run and type CMD

    4.     Type net stop PDL.

    5.     Type net stop WHSBackup to stop the Windows Home Server Backup service.

    6.     Copy the contents of D:\folders\{00008086-058D-4C89-AB57-A7F909A47AB4} to the external hard drive E:.

    7.     Type net start WHSBackup to restart the Windows Home Server Backup service.

    8.     Type net start PDL.

    The only relevant settings that Windows Home Server backup stores in the registry on the home server are Backup Time and Automatic Backup Management.

     

    After you've done this, you can power down the server and remove the two drives that are failling. Copy your data from under the \DE directories on both drives using another PC as mentioned.   When you bring your server back up, it will indicate that drives are missing. You will need to remove the drives via the console. When you do this, you will lose ALL your backups.  Once you have finished the process of drive removal and added your replacement drives, copied back your data from the shares,etc, you can restore the database:

    Restoring a Backup Database

    You may want to restore an entire backup database, which you previously saved on an external hard drive, to your home server. Prior to restoring a backup database, you need to delete the existing backup database from the home server. Currently, there is not an option to merge backup databases into a single database.  

     

    *       To restore the backup database to your Windows Home Server

    (In this example, assume that the external hard drive is given a drive letter of E:, and it has a copy of a home server backup database that was previously saved as described in the Saving a Copy of the Backup Database section earlier in this document.)

    1.     Run mstsc.exe to start a Remote Desktop Connection session to your home server.

     

    2.     Plug in an external hard drive to your home server (do not add it to the Server Storage through the Windows Home Server Console).

    3.     Open a Command Prompt, Click Start, Run and type CMD.

    4.     Type net stop PDL.

    5.     Type net stop WHSBackup to stop the Windows Home Server Backup service.

    6.     Delete the contents of D:\folders\{00008086-058D-4C89-AB57-A7F909A47AB4}.  Do not delete the folder.

    7.     Copy the contents from the external hard drive E:\folders\{00008086-058D-4C89-AB57-A7F909A47AB4} to D:\folders\{00008086-058D-4C89-AB57-A7F909A47AB4}.

    8.     Type net start WHSBackup to restart the Windows Home Server Backup service.

    9.     Type net start PDL.


    Thanks!

     


    Lara Jones [MSFT] | Program Manager
    Community Support and Beta | Windows Home Server Team
    Windows Home Server Team Blog
    Connect Windows Home Server
    Windows Home Server
    Monday, June 29, 2009 5:22 PM
    Moderator
  • Thanks,  I can probably do that without much problem.  I even have the hard drive I was going to use for that and didn't get around to :(.  I made a copy of the backup database a few weeks ago into server storage using an addon, so that might be a plan B if I can't make this work AND it's on one of the working/older drives ... In any case that helps, but I'm pretty worried about the stored data and server integrity moving forward.

    Can I (relatively) safely ignore the other errors introduced by removing both hard drives at once if I pull off the Backup Database and copy the files off the drives on another PC (using the info in "How to recover data after server failure ")?  In other words, would I then have a copy of all the data that hasn't been corrupted to that point?  If not, what would I be losing in the process?  Would I need to reinstall the OS or can I just put my recovered data back and move forward?

    I'm still fuzzy on what the server will do when I press the "remove" button and accept it's warning about data loss from multiple missing drives.  I'm hoping it will recover any data for which there is at least one copy on the drives?  If so I guess I have a plan.

    Thanks
    -Cloud
    Monday, June 29, 2009 5:59 PM
  • Can I (relatively) safely ignore the other errors introduced by removing both hard drives at once if I pull off the Backup Database and copy the files off the drives on another PC (using the info in "How to recover data after server failure ")?  In other words, would I then have a copy of all the data that hasn't been corrupted to that point?  If not, what would I be losing in the process?  Would I need to reinstall the OS or can I just put my recovered data back and move forward?


    Your data will be on the drives and you will have a copy of the backup database. If the data on the drives is still readable, you should be able to perform the missing drive removal without losing the data on the other drives while having your original data from the removed drives intact under the \DE directories.

    If for some reason, you cannot perform a missing drive removal with two drives at the same time, copy the data from both drives to another client and place one back in the server while leaving one out. Perform the missing drive removal and then go through the console and remove the second drive after the first has been removed. This will prevent the server from working both bad drives.

    I'm still fuzzy on what the server will do when I press the "remove" button and accept it's warning about data loss from multiple missing drives.  I'm hoping it will recover any data for which there is at least one copy on the drives?  If so I guess I have a plan.


    I could explain the entire process here but instead I'll copy a part from the Drive Extender technical brief.

    You can use the Windows Home Server Console to inform Windows Home Server Drive Extender that a missing hard drive will never be used. The Migrator service will enter a special repair mode, and it will inspect every tombstone file. If the tombstone had an alternate shadow on the missing hard drive, that link is removed from the reparse point on the tombstone. The Migrator service attempts to make an extra copy of the file later (so the file is duplicated again). If the master shadow was on the missing hard drive, the most recent alternate shadow is promoted to become the master shadow. If a file was not duplicated and the only shadow copy of the data was on the missing hard drive, that data is lost. The Migrator service still has work to do because a file with no remaining shadow files cannot be opened or deleted. If the Migrator service left the tombstone alone, it would continue to appear in the directory, and the user would have no easy method for deleting it. While in repair mode, when the Migrator service finds a tombstone file for which the only shadow was on a permanently missing hard drive, the Migrator service deletes the tombstone.

     


    Lara Jones [MSFT] | Program Manager
    Community Support and Beta | Windows Home Server Team
    Windows Home Server Team Blog
    Connect Windows Home Server
    Windows Home Server
    Monday, June 29, 2009 6:28 PM
    Moderator
  • Perfect answers, now all I have to do is try it when I get home.  I very much appreciate the help and hope to post back on a successful recovery.
    Monday, June 29, 2009 7:26 PM
  • Did you apply the firmware upgrades for the disks?
    Also, are there precautions done, that they do not overheat in a narrow environment?
    Best greetings from Germany
    Olaf
    Monday, June 29, 2009 9:55 PM
    Moderator
  • Olaf,
    The drives I have are CC1H and CC1J so they don't need the firmware upgrades according to Seagate.  These are actually the 2nd and 3rd failures I've seen out of my population of 11 drives across a number of machines.  They all have this same failure mode (start dropping out under load at 2-6 weeks usage and get rapidly worse). 

    I've thought about the overheat issue, but not done anything about it because they are in actively cooled bays (Cool Master 4in3s) and the device manager addon shows them not getting above 40-41C at full load.  I'll pull them out and run them on the desktop today, it's worth a shot and a lot less work than a full data recovery operation :-).

    No progress last night as I'm trying to pull enough data off my external drive to fit the backup.  It's USB so it's slow and the backup database is over 600GB!  At this rate it may well be days to pull the full data set, so depending on how frustrated I get... I might be buying an eSata drive.

    Thanks for all the data & thoughts :)
    -Greg
    Tuesday, June 30, 2009 1:27 PM
  • I just realized I never came back and reported on my outcome.  Thanks for the help and I hope this will help someone searching for solutions in the future.

    The primary problem with the HDDs appears to have been heat related.  With the drives sitting on the desktop (next to the server) they ran well enough for me to pull backups of both the backup database and the main file storage.

    I did put them back in the case with some Sythe 120cfm (extremely loud) fans and they worked for for much longer without failure.  However at a failure a week the data is more valuable than the risk and the two drives in question are now doorstops.  They will fail sometimes as low as 36C which is pretty horrible. A 18% failure rate makes those seagates not quite as cheap as they look :(.

    I have new (quite, high flow/pressure) fans in my case and all the other drives are still running fine.  Either way I'm now doing backups on a regular basis.  Now all I need as an automated way to do it so I won't get lazy about it.

    Thanks again.
    -Greg
    Tuesday, August 11, 2009 3:42 PM