locked
Hard drive failed, now can't remove? RRS feed

  • Question

  • I am having a nightmarish issue.

    Configuration: HP device, current patches/software via Windows Update (so PP2 I believe).

    Brief background: I originally added 2 HDs to the device, and so had 3 HDs. In the space of 2 days both my added HDs failed.

    WHS said they where healthy right up to the point they failed, but the Windows system event log shows warnings for days ahead of time - this makes me unhappy, becuase Windows clearly knew I was in trouble, but WHS didn't tell me.

    Current state: Now I'm in trouble, and don't know how to get out.

    What I did:

    When DR1 started failing I used WHS to remove the drive. After 3 tries, it worked (maybe - while the drive was "removed" I had all sorts of service failures and file conflict warnings via WHS).

    DR2 started failing as soon as I got DR1 out of the box.

    I quickly purchased and added a brand new HD, so I'd have room to save my data.\

    However, after numerous tries (at least 4-5) I was unable to use WHS to remove DR2. The remove process just locks up - either that or "this may take hours" means "this may take days" where "days" is more than 3 (yes, I let this run 3 days with no progress).

    So I powered down and manually removed DR2.

    Now WHS shows DR2 as "missing", but when I try to use WHS to remove it, I am told the volume has files in use and can't be removed. Obviously this is a poor error message - the drive isn't in the machine and so clearly can't have files in use.

    I've rebooted the server, but still get the same result.

    What do I do now? Am I totally done?

    I've been an avid supporter of WHS for some time now, but this sequence of events is seriously shaking my confidence in the product...
    Monday, June 22, 2009 2:21 PM

All replies

  • It sounds to me like you had multiple drives failing at the same time. This is a scenario that Windows Home Server can't really deal with. (Nor can the usual RAID levels, 1 and 5. Losing multiple drives from a single array will destroy all the data on the array.) The errors you're seeing (files in use on a drive that's not installed) is likely to be spurious, and I would recommend you submit a bug report on Connect. You should also use the Windows Home Server Toolkit server component to send logs from your server and reference the CAB number in your bug report. If you don't have it installed, you'll find the toolkit here, and directions for installation (both client and server components) here.

    As for your data, I would shut your server down now, and manually recover data off of all the drives (including the ones that are indicated as failing, assuming they will still spin up), following the process outlined in this FAQ posting.

    I'm not on the WHS team, I just post a lot. :)
    Monday, June 22, 2009 2:59 PM
    Moderator
  • Thank you Ken.

    I do understand that multiple failures may be outside the scope of WHS - we all hope this is not a common scenario - and is why I have external backups of all critical files.

    Though it appears I'm going to lose a lot of stuff I didn't think was critical, but which may take me a long time to recover... I am going to have to rethink what "critical" means to possibly encompass "everything" from now on - fortunately USB hard drives are cheap. As I re-rip my entire music library I'll have lots of time to ponder this idea...

    But my big concern is how to get WHS back to a workable state?

    Since I can't convince it to remove this now-dead hard drive, I can't get the backup service, balancing service or anti-virus service to run.

    How do I get WHS back running???
    Monday, June 22, 2009 3:10 PM
  • Test every drive in your server, connecting them to another machine to do so. I like Spinrite, but there's lots of good software out there for the purpose.

    Once you're satisfied that you're experiencing failing drives, rather than a failing server motherboard, you can replace any dead/dying drives and perform a server recovery or factory reset, following the instructions in the server documentation.

    As for data loss, in the future you can back up your shares to external media (or internal, though I honestly can't figure out the point of that) and take the copy off-site, using tools that were added to Windows Home Server in Power Pack 1. I personally don't do this, but that's mostly because I have a different (and highly customized for my own normal workflow) method that uses multiple disks that I rotate off-site, and robocopy for the share backups. (I'll note that my method is unsupported, and is probably less flexible than the supported server backup feature in some ways.)

    I'm not on the WHS team, I just post a lot. :)
    Monday, June 22, 2009 4:19 PM
    Moderator
  • I'm doing the server recovery now, so we'll see how that goes.

    Recreating the user accounts isn't a big deal, but I'm wondering how/if my dynamic DNS service will get reconnected - hopefully that's pretty automatic. I suppose I'll have to force Windows Update to run a few times so it re-downloads and installs powerpacks 1 and 2?

    My big complaint and concern with this whole thing, is that Windows knew the hard drives were going to fail for days ahead of time, but WHS was without clue. A consumer-grade appliance should be smarter than than, and should have been warning me I was going to be in trouble long before I really was in trouble.

    In other words, I'm going to spend countless hours recreating my lost data, grumbling the whole time about the fact that WHS doesn't even talk to the underlying OS to find out easily available information that could have saved me incredible amounts of time/effort/pain.

    I'll add a bug on Connect to this effect as well, because this seems like a major, major hole in the WHS story.

    (And when I say this, I'm not saying it to you Ken, I'm saying it to the WHS team so they can fix the issue and make WHS better. I really appreciate your help Ken, thank you.)
    Monday, June 22, 2009 4:26 PM
  • As for data loss, in the future you can back up your shares to external media (or internal, though I honestly can't figure out the point of that) and take the copy off-site, using tools that were added to Windows Home Server in Power Pack 1.

    I do regular backups of "critical" data to an external HD. I agree, internal makes little sense to me as well.

    My problem is that I should have considered everything to be "critical", and I didn't...

    All my work files, personal files, home videos and photos are "critical" because they can't be replaced, and they are OK.

    All my video, music, downloaded install images - these can be replaced, so I didn't consider them "critical". But in retrospect I should have just purchased more external HDs and backed them up, because it will take hours (days really) to recover this information, and if I calculate my hourly rate against the time this will take I could have purchased a whole lot of external HDs...
    Monday, June 22, 2009 4:30 PM
  • ...
    and if I calculate my hourly rate against the time this will take I could have purchased a whole lot of external HDs...
    You must bill at a lower rate than I do. :) At my usual rate, I could buy a lot of spare servers in the time it would take to re-rip my 1/2 terabyte of audio.

    But only you can determine if spending the time is worth more or less than spending the money...

    I'm not on the WHS team, I just post a lot. :)
    Monday, June 22, 2009 4:45 PM
    Moderator
  • Damn.

    The Windows Home Server Setup (after recovery) failed.

    In other words, the recovery disk did its thing and said it worked. It then launched the WHS Setup, which says "Server initialization failed" and says to contact WHS support. The back/next buttons are disabled, so all I can do is close the software.

    The server's status light is blinking blue, and the two drives are solid purple.

    Any suggestions on what I should try at this point?
    Monday, June 22, 2009 4:48 PM
  • Did the server reboot after the recovery? If so, then I would contact HP. If not, reboot it and see what happens.

    As for the failing drives, there are a lot of transient errors that can occur, so Windows Home Server has a built-in error threshold (errors on 4 successive days, iirc). That theshold is, obviously, a balancing act on the part of the WHS team. They have to pick the right spot between alerting the user on every error (which can cause people to have every drive fail repeatedly over a period of only a few weeks), or never alerting the user until a drive fails. So that's a design decision, and is subject to regular review.

    Updates: once you get your server back up and running, you will have to run updates through the console repeatedly, yes.

    I'm not on the WHS team, I just post a lot. :)
    Monday, June 22, 2009 7:20 PM
    Moderator