Microsoft > Forums Home > Windows Home Server Forums > Windows Home Server Software > system drive failure: what happens in this case?

Unanswered system drive failure: what happens in this case?

  • Thursday, January 24, 2008 11:05 PM
     
     

    Here's a hypothetical scenario that will happen...

     

    Let's say I have a system drive failure, and let's say I have backed up that drive manually through some means, or have used PP1 once it's available.  I haven't backed up all of the drives becuase I have no realistic way to back up 12TB.  The reason I back up the primary drive is I don't want to have to completely rebuild everything in case it fails.

     

    Now, let's say my last backup is a week old.  Not unreasonable to assume.  In that week, let's say 1000 files were added,, 100 updated (size changed), 100 deleted, 10 renamed, 10 moved, and 10 folders deleted.

     

    When I put in my new drive and restore, what will happen to all of the shadow copies of those modified files during that week?  What about those deleted?  What about those renamed?  Or moved?

     

    How will WHS recover the added files, if at all?  Or will it leave them as dangling orphans, using up space but never to be seen?  For those updated, will the file sizes of the tombstones be updated to reflect the new size?  Will those moved and renamed be updated as well?  Or will this end up reporting lost files, which aren't really lost becuase there are orphaned shadow copies out there (which could easily be found by comparing file size, creation date, etc.)


    Clearly it has all of the data it needs to recover added, deleted, which in turn also covers rename and move.  Updated files are easy as well (compare dates and sizes).  For example, a combinatorial tree diff would reveal the missing files of D:.  Then the appropriate tombstone could be created on D, and all is well.  The reverse diff would also apply; any file or directory under D: that did not exist under any of the other drives would mean that it was deleted, so it should be removed.  A move or rename would result in an add, and a delete.

     

    Will WHS Drive Extender do this?  Perhaps nightly after Chkdsk is run?  If not, is this a bug, or a limitation?  All of the information is there to do so safely from what I see; especially if a checksum is stored in the tombstone somewhere (that would make comparison really simple and fast).

     

    Thanks in advance for answering this.


    Regards,

    Ryan Rogers

     

All Replies

  • Friday, January 25, 2008 3:57 AM
    Moderator
     
     
    Microsoft didn't design Windows Home Server to require backing up the system drive. In normal operation there will be very little on that drive: the operating system, any installed add-ins, tombstones, and some additional control structures, all of which can be recreated during or within a few minutes after a server reinstallation. The main time consideration for the reinstallation will be the reconstruction (or validation) of the tombstones, which could take days (literally) in a 12 TB system. Your system drive is the least important drive in your system.

    In addition, I've tried backing up the system drive. I only tried a couple of times, but restoring tombstones just doesn't work if they're out of date. You would need a full backup of all drives, and I agree that that's not practical. The "Windows Home Server' way is to turn on share duplication for everything.
  • Friday, January 25, 2008 5:15 AM
     
     
    I have to back up what Ken is saying.  I've had to do a repair install twice.  The first time was in going from RTM to OEM, the second time was when I turned off Whiist and the whole thing stopped working.  The first time I had some problems I believe were caused by a DVD Drive problem, the second time all went well.  In both cases NO data was lost on any of the drives in the storage pool, other than PC Backups. It's nothing to worry about.  I have to admit I used to be one of the worst worriers about this.  I'm far from cavalier about it now, but is not something I lose sleep over.  

     

  • Friday, January 25, 2008 10:11 AM
     
     

    I agree regarding the drive being the least important..

     

    But I would not say it is unimportant if building the Tombstones is going to take days.  It still needs a backup.  My customers will not accept that over my RAID solutions that have them back up and running within minutes, or, with my hot spare offering, never even going down!.  Granted, while restoration copying is going on the system isn't at full performance capacity, but at least it's usable.


    It may also not be fair to compare DE vs. RAID wrt disaster recovery.  Apples vs. Oranges.  But MS compares DE vs. RAID when it comes to simplicity to manage and ease of extendibility.  So if the shoe fits.  And my customres will care about the difference if they lose HD0.

     

    I'm not saying MS is stupid here with their design.  It's an elegant design that will work quite well for intended target audience.  It's actually quite brilliant.  I'm just pushing things a bit with my customer base.

     

    To be honest, I wouldn't even be looking at DE if I didn't get support calls over RAID today.  Until I deploy a product in the real world, I'm not sure if the benefits of DE wrt these calls will outweigh what I'm losing wrt disaster recovery (DR).  But I have a gut feeling it will.  And lately since I've been losing sales to the Drobo, it seems the customer base really is ready for this, and willing to give up a bit of storage and speed for the ease of management.

     

    While clearly DE over RAID5/6 is a braindead concept, I'm beginning to think that RAID1 mirroring the boot disk may be an excellent idea.  It reall avoids a lot of these DR issues.  Once this drive is mirrored, assuming all shares are duplicated, every byte ont he computer will be redundant.  You can lose any single HD, and the system keeps humming right along.  That is a big selling feature to show that I can pop-out any of those hot-swappable drives and everything keeps chugging right along.

     

    Of course...then I'll have to figure out how to get hot-spare working.... ;-)

     

    - Ryan

  • Friday, January 25, 2008 11:08 AM
    Moderator
     
     
     ryan.rogers wrote:

     I'm beginning to think that RAID1 mirroring the boot disk may be an excellent idea.  It reall avoids a lot of these DR issues.  Once this drive is mirrored, assuming all shares are duplicated, every byte ont he computer will be redundant.  You can lose any single HD, and the system keeps humming right along.  That is a big selling feature to show that I can pop-out any of those hot-swappable drives and everything keeps chugging right along.

     

    Of course...then I'll have to figure out how to get hot-spare working.... ;-)

     

    - Ryan

    That's exactly what I had  in mind once I'm confident that all of these data corruption issues have been cleared. Perhaps I'll even wait for v2, considering that this one will possibly based on server 2008. Until that time I'm happy running OEM trial version.

     

    I wonder if anyone has already tried mirroring for the boot disk.

  • Friday, January 25, 2008 11:41 AM
     
     

    I have to think somebody has tried it.  THere is no reason it wouldn't work.  A true hardware RAID soluion would be completely oblivious to Windows.  Now software RAID...that might not be a good idea; but I plan on evaluating it.  I wonder if the Disk Manager even has it...since it's based on Server 2003, it should.  However, I've had a lot of issues with corruption w/ Software raid under windows.  The chance of a problem occurring from that IMO is far larger than the chance of the HD crashing.

     

    Yeah the data corruption issue is at the top of my list right now.  I'm actually *very* glad I got busy last year with contract work and never had a chance to start investigating WHS seriously until recently.  It could have been a complete disaster for me if I had put product in the field with this issue.  The fact that MS can repro it with a harness is key, a fix will come eventually.

     

    My only concern is that it may require some sigificant trade-offs and/or loss of functionality to truly fix.  If this was a simple fix, we would have seen it by now.  For example, if they have to make DE changes to shadow en-masse, or otherwise delay their shadow updates, you are losing important functionality from a disaster recovery (DR) standpoint.  Right now the window of opportunity of a loss of data due to the shadow not being propagated is virtuall nil.  The larger that window becomes, the less robust the entire solution is from a DR standpoint.

     

    Ryan

     

     

     

  • Friday, January 25, 2008 4:29 PM
     
     

    Ken, a few quick questions.  You stated:

     

    "In addition, I've tried backing up the system drive. I only tried a couple of times, but restoring tombstones just doesn't work if they're out of date."

     

    That partially answers my question, but could you elaborate on this?  Did the restore work, and you simply were missing files (ie chanes made post-backup that would of course not be there)?  Did the drive get detected as unhealthy?  Did DE even attempt to recover the lost files by scanning the secondary data partitions?

     

    Also, what happened to the orphaned / dangling files on the secondary drives?  Did Windows detect them in any way and delete them, or do they hang around forever?  Will it restore them, since it has their data, filename, location, etc.?

     

    What I'm wondering is if this could be a good opportunity for a utility which performs synchronization in this case?  It could scan and diff the secondary drives from the primary, and reload the file D: file differences back through the normal duplicated share, and therby force re-synchrnoization?  After this first pass, it could then remove orphaned tombstones, but I assume WHS has something to do that automatically, so that's less important (since they would have been culprits of renames, moves, or deletes that were lost between backup and restore, they can be safely deleted, as the original orphaned shadow copy would have been restored).

     

    Thoughts?  I'd be a bit shocked that WHS doesn't have something like this built in, since the data is there, and this can happen in the real world w/o restoring a backup of HD0.  For example, chkdsk could resul tin lost sectors on HD0 which result in lost tombstones.  WHS has all of the data to recreate and recover from this.  Will it?  Anf if it will when part of the drive is lost, why can't it when the entire drive is reverted to an earlier state?  The same code addresses both problems.

     

    Thanks,

    Ryan

     

     

     

  • Friday, January 25, 2008 5:09 PM
     
     

    Now software RAID...that might not be a good idea; but I plan on evaluating it.  I wonder if the Disk Manager even has it...since it's based on Server 2003,

    I would agree that it's not a good idea Smile And be wary, DiskMan can trash a WHS install as it has no understanding of DE.

     

    As you've probably read in the DE Tech brief, hardware raid is not supported for WHS.

     

    WHS has the mechanism to recover the data in the event of a reinstall, as that is part of the process; but restoring it to its previous state just moves the goalposts - how would it know that it would need to rescan? I've not had a situation where it's had to recover from an error chkdsk has identified, but I've not heard of a mechanism to fix that.

     

    The WHS Toolkit can scan and report certain errors, so worth a look if you haven't already.

     

    I agree with you regarding the fix, as it's obviously going to require some lower level changes.

  • Friday, January 25, 2008 5:56 PM
     
     

     

    Actually I did read the tech brief a while back, but tought it said not recommended vs. not supported.  There is quite a big difference.

     

    Just checked, and yep, it says not recommended:

     

    Code Snippet
    It is highly recommended that you not use hardware RAID technologies for your home server. Recovering from hard-drive failures becomes increasingly complex when hardware RAID systems are used. The recommended approach is to use multiple hard drives that are configured as Just a Bunch of Disks (JBOD).

     

     

    I doubt MS could claim this is unsupported, especially if you have the technical prowess to handle the "increasingly complex" issues. ;-)

     

    Besides the term "RAID technologies" is far too broad.  There is a huge difference between RAID 1 mirroring one drive and RAID 5 striping the entire data array.  If you are an idiot and you build your entire DE segment over top of a RAID 5 array, you get what you deserve.  But that is a far cry different than using RAID 1 on HD0, which as it turns out, has zero redundancy within the system as designed (unless you consider reinstalling the OS "redundancy" ;-).

     

    Using RAID 1 on disk 0 will only make the system more reliable, and far, Far, FAR easier to recover from a drive failure.  Not less so.  If MS says this is not supported, it is because some marketing bozo is in charge, and not an engineer with a clue.  It also makes the system absolutely NO harder to maintain in the normal non-failure case, as it's 100% transparent until you have a failure, and if you ever do, you'll be glad you have it!!!  You will be able to recover in potentially minutes vs. hours to days!  What happens if you lose a second hard drive during that recover window that takes hours to days?  You got it...DATA LOSS.

     

    There is not only no reason for it to be unsupported, it should in fact be wholeheartedly *recommended*.  But they can't do that, because they have built an alternative to RAID 5.  That mixed message would never be allowed past the Marketing Police.  To the average cluless home user, RAID is RAID.  MS can't muddy the picture by saying "RAID xyz is a horrible idea, but RAID abc is a great idea".  But sadly, even though you'll never hear them admit it, that is *exactly* the case.  RAID 5 on WHS is pointless.  Self-Defeating.  Plain Stupid.  However, RAID 1 on WHS addresses a very significant design limitation of Drive Extender implementation.  It is, in fact, a good idea.  The only downside is you lose a drive space and spend some more $$$., but with external drive support, this is a welcome trade-off.  But you get redundancy...redundancy that is extremely important for some OEM providers like myself who are building whole-house automation and media controllers that need to be up 24/7/365.  Living for hours to days while the system rebuilds is not an option, and that even assumes you have the spare HD lying around.  With hardware RAID 1, the sysmem doesn't even hiccup when the HD fails, much less go down for days on end.

     

    Unless somebody from Microsoft posts that RAID, of any type, is officially not supported and is willing to put it in formal writing, will I believe that this is the case.  And if they do, they had better bring a good reason with them, 'cause so far I've yet to find one.  I understand not recommended.  I can live with that because I know what I'm doing.  But that's different than not supported.

     

    I'm not saying this solution is for everybody, and mass-market devices like HP's won't need this.  But OEM's serving niche markets like myself should have the option to do this and have it be supported.  Otherwise, we can't use WHS, and have to stick with XP Pro which can do RAID 1.

     

    Ryan

  • Friday, January 25, 2008 6:25 PM
    Moderator
     
     
    Look around, brubber. Lots of people have tried RAID arrays of various "stripes". (Hah. Hah. ) I did it the first time I built a WHS PC, putting my entire server on a RAID 5 array.

    It's pretty easy to install on a RAID array:
    • Make sure your HBA has solid Windows Server 2003 drivers (Windows Server Catalog listed, please).
    • Install a floppy drive.
    • Prepare an "F6 floppy" on some system, containing your RAID HBA drivers. (You can try a USB flash drive, but I've never managed to get that to work...)
    • Install only the drives that are in the array, and build the array. Perform any other BIOS setup that may be required at this time as well. (You might be able to get away with having other drives installed. Let us know if it works for you. Smile )
    • Boot off the installation DVD into the initial graphical setup. Setup will probably not detect your HBA/array, because driver support is limited.
    • Insert the floppy, load additional drivers, and point to the floppy. (Setup should see the array as a single drive.) If asked whether you want to carry the drivers forward to the next stage, you can say yes, but it won't matter.
    • Proceed through the rest of the initial graphical setup, to the point where setup reboots.
    • You will now enter text-mode setup for Windows Server 2003. Near the beginning, press "F6" at the "Press F6" prompt. If you miss, reboot and try again.
    • Load the drivers again.
    I assume y'all can handle the rest...
  • Friday, January 25, 2008 6:37 PM
    Moderator
     
     
    Ryan, there's a "rebuild" function in Drive Extender somewhere. It will rebuild a missing D: drive from scratch. But if the drive is there, and all the control structures are there, why should it assume there's anything wrong?

    As for my abortive attempts at restoring tombstones, I fried my backup database the first time. I could have recovered (the backup database is just a bunch of large files which I could have copied off the secondary drives and then back into the appropriate location in the storage pool), but there were various other issues as well; I'd intentionally done a bunch of manipulation of the shares between backup and restore, several backups, etc. The second time was more to see if I'd done something wrong the first time. No such luck. Smile

    I would really recommend against trying any sort of backup/restore of the system drive as a result of my experimentation. RAID though unsupported (that's not in the DE tech brief, but it has been stated elsewhere) is a better option. And remember that "unsupported" is a matter of degree here. You're going to be the support for your clients.
  • Friday, January 25, 2008 9:01 PM
     
     

     

    Ken wrote:

     

    "But if the drive is there, and all the control structures are there, why should it assume there's anything wrong?"

     

    For a total wipe out of course this is not relevant.  Except unless I tried to restore with a version thta has a few out of date tombstones.  Yeah I know it's not supported....

     

    But what if I lose several sectors which are found during chkdsk, and the result is some lost tombstones?   I seem to recall reading that WHS runs chkdsk on it's own automatically.


    Will WHS attempt to recover these lost tombstone files?  Like I said, a comparison of the shadow copies on the secondary data drive partitions and the tombstones should show what is missing, and allow them to be repaired automatically.  In theory.  And assuming there isn't some process that finds the orphans and clobbers them.

     

    What I'm trying to figure out is if WHS will do this automatically, or if a tool is provided (PP1, toolkit, etc) to do this.  If so, it may also  be usable for resorations of D: backups only.


    If not, I see a need for such a tool.  Not necessarily integrated into the product.  A console tool I can run from boot CD would be fine (and this way i can be sure the data isn't changing underfoot).


    I do "get it".  I'm just trying to stretch the product so that it works appropriately for a slightly different market than the product designers originally targeted.  In short, I'm mainly trying to get more robustness in case of drive failure, and faster disaster recovery when there is a drive failure.  Any drive.

     

    Thanks,

    Ryan

  • Saturday, January 26, 2008 12:11 AM
    Moderator
     
     

     Ken Warren wrote:
    Look around, brubber. Lots of people have tried RAID arrays of various "stripes". (Hah. Hah. ) I did it the first time I built a WHS PC, putting my entire server on a RAID 5 array.
    Thanks Ken, I know how to build RAID array

     

    I know there's numerous threads about RAID arrays, however I have never seen one where someone used a RAID-1 array for the primary disk only. If you can point me to one or a few please do!

     

    Personally I agree with Ryan Rogers that (hardware) RAID-1 bootdisk + duplication turned on for all shares AND client backups combined with off-site backup for shares and client backups is a very solid way to setup WHS, allowing fast recovery from single drive failure. Personally I would also like to have off-site backup for my system disk, however that doesn't really seem feasible on short-term.

     

    With single system disk and a large amount of storage it may rebuilding tombstones for days or even longer in case you have to reinstall your system drive. With RAID-1 you just swap out the disk and your in business again.

  • Saturday, January 26, 2008 1:55 AM
    Moderator
     
     
    Brubber, it's all the same concept. You need to install drivers twice, off of floppy (or USB flash if you can get that to work; I never have). RAID 1 or RAID 5 doesn't matter; it's the same driver. You sacrifice performance with RAID 1, but gain reliability. If that's a good trade-off for you (remember no support) then go for it. If your RAID drivers or HBA turn out to be less than fully baked, or if you made a poor choice of drives and they keep dropping out of the array, well, you made the choice...

    And I agree with RYAN, too, in terms of reliability. RAID 1 for the system drive would be better than the current rebuild mechanism. But it's not an option in the HP MediaSmart Server, which I happen to like. If my production box were a homebrew system, I might have good RAID controller in it, and a RAID 1 array for my system disk. Or not; I can afford to have my server down for a couple of days, and I'd rather not spend the money if I don't have to. I find the reinstallation option extremely reliable, and the time it takes to rebuild tombstones isn't an issue for me.
  • Saturday, January 26, 2008 2:44 AM
     
     

    While a bit facetious, Ken is correct.  Setting up RAID can seem ridiculously complex to the noob.  But when you are setup with the right tools for system building it's a whole lot easier.   When setting up a RAID1 system, I simply have to image 2 drives instead of 1.  So it takes 8 minutes instead of 4 to do a full install.  No problem. ;-)

     

    brubber wrote:

     

    "Personally I would also like to have off-site backup for my system disk, however that doesn't really seem feasible on short-term."

     

    You probably already know this, or may mean something else entirely, but If by off-site you don't necessarily mean pushed over a WAN, this can be really simple to do.  You basically need a decent hardware RAID 1 solution and a third disk.  You always keep 2 disks in the mirror and 1 offsite.  When ready to backup offiste, you simply force a failure by popping a disk (after shutting down and chkdsk'ing), reboot and mirror to the swapped-in disk (which was brought home from offsite)., and then store the disk you removed when you go back off-site.  It takes a few minutes tops.  If you have a decent controller the slowest part is the chkdsk, which is highly recommended as you don't want to be propogating issues around in your backup circle of life. ;-)

     

    Depending upon controller, you can have it finish the mirror within BIOS before even booting into Windows, or you can let it finish it in the background on higher-end controllers.  Also with some  controllers you don't have to trick it into a failed state, you can simply enter it's BIOS and often there is a function to backup the mirror, but the steps you end up following are pretty much the same.

     

    Note that there are a few neat benefits of this approach:

     

    - you are continually rotating all three drives.  So there will be significantly less wear and tear on a given drive over a given period of time.

    - you have a spare relatively close-by if one were to fail.  It's not hot-standby, but it could be pretty close.  So if you do lose a drive, you aren't limping along with 1 drive for long at all.

     

    Of course, if you got the bucks and the space, a good controller will support hot-spare, and you can have FOUR drives in rotation. ;-)

     

    - Ryan

  • Saturday, January 26, 2008 2:58 AM
    Moderator
     
     
     ryan.rogers wrote:

    brubber wrote:

     

    "Personally I would also like to have off-site backup for my system disk, however that doesn't really seem feasible on short-term."

     

    You probably already know this, or may mean something else entirely, but If by off-site you don't necessarily mean pushed over a WAN, this can be really simple to do.  You basically need a decent hardware RAID 1 solution and a third disk.  You always keep 2 disks in the mirror and 1 offsite.  When ready to backup offiste, you simply force a failure by popping a disk (after shutting down and chkdsk'ing), reboot and mirror to the swapped-in disk (which was brought home from offsite)., and then store the disk you removed when you go back off-site.  It takes a few minutes tops.  If you have a decent controller the slowest part is the chkdsk, which is highly recommended as you don't want to be propogating issues around in your backup circle of life. ;-)

     

    Depending upon controller, you can have it finish the mirror within BIOS before even booting into Windows, or you can let it finish it in the background on higher-end controllers.  Also with some  controllers you don't have to trick it into a failed state, you can simply enter it's BIOS and often there is a function to backup the mirror, but the steps you end up following are pretty much the same.

     

    - Ryan

    With WHS you will run into problems with this scenario since it stores reparsepoints, often referred to as "Tombstones", in D:\shares and D:\folders. Even if you change disks on a weekly base these will get outdated, causing all kind of problems when you swap in an old disk or if the whole array fails.

     

    I don't like WAN backups; It's slow and I don't like the idea off depending on some third party AND requiring a WAN connection to restore backups. I prefer tape or ext. hdd

  • Saturday, January 26, 2008 4:09 AM
     
     

    Sorry I'm not quite sure I explained myself correctly.

     

    You wouldn't be changing the tombstones during a swap-out.  What you would be swapping out is a copy of the mirror, so that it can be stored offsite for backup purposes.  What you would be swapping in is not authoritative.  It is the previous backup which you will now be overwriting.  You aren't restoring to it.

     

    In this RAID-1 world, it is unlikely you will ever have to restore from this off-site disk, as you have RAID-1. ;-)   But yes, if you did, you would have an issue of shadows being newer than your tombstones, as Ken and I have discussed ad-nauseum in another thread. ;-)

     

    This of course brings up a great point which I didn't think of when I answered your original question, I was thinking in general terms not specific to WHS.  Offsite backup of HD0 is of minimal use for WHS once it's already RAID1.  The reason being is the primary use of offsite backup is due to total system wipe out at primary location.  In which case not ony is HD0 gone, but the rest may be as well (fire, flood, theft, etc).  So it really all has to be remote, or why bother?


    So while the process works well in theory, and I've used it for many, many years over various workstations, it wouldn't make much sense for WHS HD0 RAID1.  You really need the entire thing offsite or don't bother. 

     

    Which of course brings up the whole issue of off-site when you have massive datasets, and frankly most (but not all) of my customers simply ignore off-site.  The ones that do bother typically have < 1 TB data and use USB external drives.  Those with > 10TB of data just don't bother.  Most don't even have local secondary backup.  They are completely relying on (today) their RAID5 arrays.

     

    Regarding WAN backups, I agree for the most part, but it is getting better.  It's not uncommon for people to have 1mbps upspeeds now; you can push a lot of data in 24 hours with a decent provider.  If you have a lot of large, static files (think videos), it can work quite well, since it isn't changing frequently, and as such you don't need proprietary software.  DAV or FTP will work.

  • Thursday, December 03, 2009 2:21 PM
     
     
    Hope you don't mind me chiming in as a newbie (in fact I don't even have WHS, but am looking at it).

    I was actually quite stunned/amazed when I learnt that WHS doesn't have redundancy for the system drive. There's whole articles on how WHS is more flexible than traditional raid, yet this limitation isn't even mentioned.

    I have had a couple of disk failures this year and are looking to migrate from a basic (no raid) NAS. WHS looks interesting, but this is a real ____ in the armour IMO.

    Even entry level 2-disk NAS's are coming with mirroring support, so that a disk failure simply results in a hw swap activity -- not even any real sw interaction. All automatic.  That is the ease of use surely customers desire, not having to re-install the OS!

    I was even more surprised to see that 10-year old microsoft technology (dynamic disks) which I already use in another system to do boot drive mirroring is not supported, and even HW raid isn't recommended.

    Whilst one can argue the re-install is fairly quick, the fact is that WHS is an OS. That OS gets customized/extra apps over time. Now one should ALWAYS have a backup, and there may be cases where the install is required, but to not have mirroring to prevent this, it just seems like a target miss to me.

    Am I off-mission? missed the point?
  • Thursday, December 03, 2009 11:03 PM
     
     
    WHS's system drive issue is one of those things that tends to be rather agressively overlooked and "downplayed" simply because the best and most common sense solution (raid 1) go completely against the selling strategy MS has for WHS... no raid needed.

    While I love WHS and really do like the duplication feature over managing raid arrays, the lack of support for mirroring the system drive or providing an "EASY" way to recover from a loss, is a serious issue with WHS that I hope is resolved sometime soon.   I had to do a restore of my system and I found that the only practical way was to simply copy EVERYTHING from the old share drives manually back into newly created shares on the rebuilt server.   The "automatic" process for reinstallation is very kludgy and prone to error and problems, I won't touch it again with a 10 foot pole.

    All that aside however you can at least rest assured that with duplication your data is safe, and at the end of the day that is the most important thing.   I will probably end up adding a new RAID 1 array for my system drive soon to add reliability there.

    As for NAS and such, certainly they are solutions, but I really think WHS is going to be the way to go long term.  
  • Monday, July 26, 2010 2:09 AM
     
     

    What I know about WHS:

     

     

    - WHS used Drive Extender (DE) technology to make many hard drives look like one large drive to the system.

    - The data hard drives installed in the system are formatted NTFS as individual hard drives. There is no spanning of

     

    a single volume.

    - WHS is able to make all the hard drive look like one large volume and allow a user to access any file from this

     

    one large virtual volume by using "tombstone" files

    - tombstone files are 4kb files that reside on HD0 on the D: partition. Each tombstone file contains information on

     

    one real file including information such as file size and the actual location on the various data drive of the

     

    real file (or two real files if folder duplication is turned on)

    - If one data drive were to be removed from WHS and inserted into another machine that can read the NTFS file

     

    system, the directories and files on that drive would be entirely readable and accessible.

    - when folder duplication is turned on WHS will store two copies of all files and folders within that duplicated

     

    share. Each copy will be on a different physical data hard drive to protect data from a single drive failure. (data

     

    can survive multiple drive failures without data loss as long as drives containing the primary and mirror location

     

    of any given share don't both go down.

    - WHS used to incorporate a landing zone (all new files went here first before being moved onto a data drive) but

     

    now no longer does. All files are put out on the data drives straight away.

    - if the system drive (HD0) goes down, the data on the D: partition is not lost however it can be troublesome to

     

    rebuild the system drive to its exact state prior to the drive failure.

    - when the system drive (HD0) goes down, the tombstone files which are stored on HD0 on partition D: are lost as

     

    well.

    - Microsoft does not offer any way to backup the system drive or the system partition.

     

     

    - Microsoft’s solution to the primary drive or primary partition going down is to replace the drive (if necessary)

     

    and reinstall WHS being """very careful""" to chose "reinstall" during installation instead of "new install".

    - If you choose "new install" during WHS reinstallation then all the data on your installed data drives will be

     

    wiped off as WHS prepares them to become part of a new Home Server.

    - If you choose "reinstall" during your reinstall then WHS will scan all the files on your data drives and recreate

     

    all the tombstones for all of your files and place them again on HD0 in the D: partition.

    - even if you do choose "reinstall" you will still need to reinstall all applications you had installed on your

     

    WHS, reinstall all add-ons you had for you WHS, recreate all user accounts and associated permissions, recreate

     

    your shares and reconfigure all your reinstalled applications and add-ons.

     

    - you may even need to reinstall all the Windows Home Connector software on every computer that ran it to connect to

     

    this WHS (verification please)

    - that above scenario could be very tedious especially if you have a lot of heavily customized applications and

     

    add-ons on your WHS.

    - There is much discussion on the internet how to overcome this problem via either mirroring of HD0 or

     

    implementation of backup and restore software but there seems to be serious negatives to both approaches.

     

    - firstly mirroring of HD0. It seems everyone cautions against using a software raid to mirror HD0 (why, to the best

     

    of my knowledge software raid outperforms hardware raid these days on nearly every level)

    - Though it is not mentioned in any of the forums, I have become aware and have confirmed what I consider to be a

     

    significant limitation of hardware raid. Any volume created with commonly available or affordable (non high end and

     

    extremely expensive) hardware will be locked to the controller that is handling the hardware raid. This means that if

     

    the controller were to ever fail or need replacing, or the hardware that it is embedded into ever failed or needed

     

    replacing, the hardware must be replaced with something that had the "exact" same model of controller, otherwise the

     

    new controller will not recognize the existing raid volumes. What this means is that if your hardware raid controller

     

    goes down you have then instantly lost both your original and mirrored HD0, unless there is an identical model of

     

    controller available out there on the marked. For this reason alone I have no interest in hardware raid over software

     

    raid.

    - Additionally I have seen on the forums an additional problem that could arise from mirroring of HD0. If the

     

    mirror drive fails you would simply replace it and keep on running as per normal, however if the primary drive fails

     

    and you swap in your mirrored drive to become the new primary there will be problems. WHS will still boot, however

     

    it is reported that because HD0 is included in the virtual volume via Drive Extender (DE), your new primary drive

     

    will not be in that virtual volume. It is reported that the problem comes from the Disk ID being different for

     

    this new Hard Drive. Apparently this scenario would prevent the system from being usable, though it seems no one has

     

    gone into detail as to exactly how it becomes unusable. I have found instructions here

     

    http://www.mediasmartserver.net/2010/01/17/forum-spotlight-how-to-successfully-clone-and-upgrade-a-whs-system-drive

     

    /  about how to clone the Disk ID to fix this sort of problem but apparently the process is extremely tricky and as

     

    well I have read somewhere that there may be issues with changing the partition sizes (verification please)

    - Even if the Disk ID clone solution works, that does not address the issue of hardware raid so I do not see this

     

    as being a suitable solution

     

    - Alternatively there has been discussion of making backups for later restore. Most discussions seems to revolve

     

    around the idea of making a backup of both the system partition (c:) and the part of the virtual D: partition that

     

    resides on HD0 in order to get a backup of the tombstone files as well.

    - It has been said that this sort of backup would only be suitable for restore if that data in WHS had not been

     

    changed in "any" way and therefore the restored image and restored tombstones would still all point to and describe

     

    your files correctly. It's not really explained clearly but it seems that if old tombstones were restored there

     

    could be dire consequences and there would be no opportunity to get the WHS to rescan and update the tombstones

     

    before damage was done. (verification please)

    - there does not seem to be much discussion about backing up the system partition (C:) only and omitting the

     

    tombstone files. Perhaps because there is also not much discussion about how to force WHS to rescan and update

     

    tombstone files without reinstalling WHS, which is what we are trying to avoid.

    - It is suggested that Acronis is capable of creating a viable "point in time" backup of C: and has been suggested

     

    somewhere that it could be possible to tell WHS to rescan and update the tombstone files via the Recovery Console

     

    (verification please). If these statements are both true, if something went wrong with the system installation would

     

    it not be possible to simply restore your backup of C: only and be instantly back up and running since you did not

     

    modify any of the tombstones during such a restore? And if HD0 did fail, could you not restore your image of C:

     

    only to a new drive, create a D: partition in the remaining space, use the Disk ID solution above (if it is

     

    workable) to get WHS to accept the new system drive, and then use Recovery Console to get WHS to rescan your data

     

    and recreate tombstone files? As long as folder duplication was on for all folders then there should be no risk of

     

    having permanently lost any files that may have been residing in D: on HD0 right?

     

    I am curious as to people’s thoughts in this.

     

    alternatively could one not keep detailed notes on what programs or add-ins were install on their WHs along with

     

    settings choices or, if possible, just create a backup of any settings configuration files that might exist? Such

     

    notes and configuration file backups could make a reinstall fairly pain free no? For those of us who take care of

     

    our own servers anyway, perhaps not a good solution for servers in clients homes

  • Monday, July 26, 2010 2:10 AM
     
     
    sorry about the doble spacing, not sure what happened there.