locked
Remove a drive - how long should it take? (and, is "file conflict" truly drive-related?) RRS feed

  • Question

  • I started getting "file conflict" errors on Thursday (3/26/09).  Given that the affected files were in my media share(s), and that I don't want to lose stuff, I started working up a plan to mitigate, assuming it is a drive problem.

    Current configuration is 2 x 500Gb drives (1 system, 1 "pool") and 2 x 1Tb drives, all in an HP EX475 box, with about 1Tb space free.  My initial intent was to pull some stuff off onto an external backup drive to free up some space, then "remove" the 500Gb drive as part of an attempt to diagnose/resolve the file conflicts.  My media share(s) all have folder duplication ON, so I assumed it would take a while for the system to move data from the 500Gb drive and "stripe" it onto other drives in the pool.

    But...the "remove" process has now been running for over 36 hours.  It appears to be making progress (I am remoted into the console, and the progress bar advances every few hours), but... this seems ridiculous.

    Is it normal for it to take this long?

    And... different topic, but - in speaking with a number of colleagues, this morning, it seems several of us started receiving "file conflict" errors on Thursday.  Which makes it really hard to believe it is a drive problem after all (would be an incredible coincidence for all our drives to start having a problem on the same day).

    Any hints on what the "file conflict" issue *really* means, and how to resolve it?

    B
    Monday, March 30, 2009 5:44 PM

All replies

  • I started getting "file conflict" errors on Thursday (3/26/09).  Given that the affected files were in my media share(s), and that I don't want to lose stuff, I started working up a plan to mitigate, assuming it is a drive problem.

    Current configuration is 2 x 500Gb drives (1 system, 1 "pool") and 2 x 1Tb drives, all in an HP EX475 box, with about 1Tb space free.  My initial intent was to pull some stuff off onto an external backup drive to free up some space, then "remove" the 500Gb drive as part of an attempt to diagnose/resolve the file conflicts.  My media share(s) all have folder duplication ON, so I assumed it would take a while for the system to move data from the 500Gb drive and "stripe" it onto other drives in the pool.

    But...the "remove" process has now been running for over 36 hours.  It appears to be making progress (I am remoted into the console, and the progress bar advances every few hours), but... this seems ridiculous.

    Yes, that is extraordinarily long.  Did you run chkdsk /r on that drive before you started removing it?

    Is it normal for it to take this long?

    And... different topic, but - in speaking with a number of colleagues, this morning, it seems several of us started receiving "file conflict" errors on Thursday.  Which makes it really hard to believe it is a drive problem after all (would be an incredible coincidence for all our drives to start having a problem on the same day).

    Any hints on what the "file conflict" issue *really* means, and how to resolve it?

    B
    File conflicts are ususally caused by 2 things:  1) hardware issue of some sort and 2) files being left open for more than 24 hours.  Another possibility is that your tombstones are no longer pointing to the files on the secondary drives, but that is still usually indicative of a hardware issue.  What is the exact detailed error message in your Console?
    Monday, March 30, 2009 11:48 PM
    Moderator
  • Don't have it written down, and the "remove disk" process is still "running" (50+ hours, now).

    Which begs the question: in order to tell you the exact error message, this "remove disk" will have to end somehow.  Is there a "safe" way to kill it?  If I reboot the server, how bad will I be shredding the data on my server?  I can live with losing a few files, I *really* don't want to lose my entire media library.

    B
    Tuesday, March 31, 2009 3:26 AM
  • OK, now I'm officially starting to freak out.

    I opened up "shared folders" on a connected PC, and it indicated I have nothing but a share called "printers".

    Is that an artifact of the "remove disk" process (e.g., while it is re-striping all my data, it makes the shares unavailable)?  Or... have all my folders been nuked?

    I'm going to be REALLY unhappy if it is the latter.
    Tuesday, March 31, 2009 4:31 AM
  • OK, now I'm officially starting to freak out.

    I opened up "shared folders" on a connected PC, and it indicated I have nothing but a share called "printers".

    Is that an artifact of the "remove disk" process (e.g., while it is re-striping all my data, it makes the shares unavailable)?  Or... have all my folders been nuked?

    I'm going to be REALLY unhappy if it is the latter.
    As I recall, that is normal (although I've never personally had a drive fail).  I believe the shares are disabled during drive removal (in order to keep the data from being accessed while it's being relocated).
    Tuesday, March 31, 2009 4:36 AM
    Moderator
  • An update:

    The "remove disk" process ended when the power went out this morning (72-ish hours).  It "appears" to have failed in a non-destructive way; by that I mean that my shares appear intact, and nothing appears to have been lost.  But, the root problem remains: I am getting "file conflicts", and I am not able to copy everything off my server (either to an external e-sata drive set up as a backup drive, or by copying folders from the shares onto a connected PC).  In both cases I get a message that there was a device error.

    The exact "file conflict" error I get is "There are file conflicts.  The request could not be performed because of an I/O device error".

    I have rebooted my server several times since Thursday, and the exact same pattern occurs each time:
    -- it boots "healthy" (all drive/hardware indicator lights "blue", network status is "healthy")
    -- after 10-15 minutes, network status turns to yellow, and I get an alert saying "you have file conflicts".  All indicators are still "blue".
       (if I look at "details" through the console, there will be 2-4 files listed as having "file conflicts" - it is always the same files)
    -- after another 10-15 minutes, I get another alert about file conflicts
       (if I look at "details" through the console, there will now be 12-15 files listed as having file conflicts)
    -- after another 10-15 minutes, I get an alert indicating that the "backup process has failed", and network status goes to "critical.
       (at that point, the "status" light on the front of the box goes red)

    It generally stabilizes about this point; however, there are a number of things worth (?) noting:
    -- all "health" indicators on the front of the box are still blue.  The box itself is not aware of any hardware issues.
    -- thru the WHS console, all drives continue to be listed as "healthy".  I have NEVER gotten an indication that one was faulting or needed "repair"

    ALL of this leads me to believe this is a set of server/software issues.  I have *never* gotten an indication that a drive is going bad, only that there are file conflicts (presumably between the two separate copies of files striped onto separate drives as a result of folder duplication).

    Edited to add: I forgot to mention, I did a "chkdsk D:/r" this morning.  no errors, no bad sectors, no repaired indexes, no recovered blocks.  I don't think it is a drive problem.

    At this point, I am less concerned about replacing drives, and more concerned about how do I get my stuff *OFF* this box so that I have other options.  As mentioned above, if I try to backup folder-shares onto a directly-connected e-sata drive set up as a backup device, or copy them onto a usb-drive connected to LAN-connected PC, I get IO errors.  And not just on those 10-12 files - I get them on an alarming percentage of files, enough to really worry me that the WHS "folder duplication" construct is not going to end up protecting my files, it is going to end up making them unrecoverable.  I bought into the WHS concept in no small part because of the folder duplication and the implied "safety net" it represents.  I have steadily lost confidence in that safety net since Thursday, and now I pretty much want to get away of it as fast as I can.  I am *certain* that I could put my sata drives into a NAS enclosure, hang it off my network configured as a RAID-1 share, and NOT be having these problems.

    In that light, I'd welcome suggestions: In particular, any hints about how I might use some combination of directly-connected e-sata or usb drive and command-line (COPY? XCOPY?) on the underlying WS03 console itself to slurp stuff off my "D" drive.

    Thanks in advance.

    B
    Tuesday, March 31, 2009 7:13 PM
  • Please see this FAQ posting for some more information on these types of errors. The "Device I/O" errors are probably why Windows Home Server was taking forever to remove the drive in the first place.

    As for your data, you have a couple of options. If all your shares have duplication turned on, shut the server down, remove the drive having problems, and restart. You should be able to remove the drive then. You may lose your backup database as a result of this.

    Your other option is to remove each drive from your server one at a time and connect it to some other computer. Then copy everything you find in the hidden folder <drive>:\DE\Shares\etc. to somewhere else. This is your best path if you want to move away from Windows Home Server.

    I'm not on the WHS team, I just post a lot. :)
    Tuesday, March 31, 2009 7:28 PM
    Moderator
  • If all your shares have duplication turned on, shut the server down, remove the drive having problems, and restart. You should be able to remove the drive then. You may lose your backup database as a result of this.

    Yes, all my shares have duplication turned on.  However, I'm not sure how to determine which drive is "having problems" - they all appear to be healthy (at least, the WHS console has never indicated anything but "healthy" for any of them.)  As an aside, I don't have any PC backups done through WHS, so losing the backup database is a non-issue.  I've been primarily using the WHS shares as media repository (family photos, movies, iTunes library, etc) and some of that stuff is non-replaceable.

    Your other option is to remove each drive from your server one at a time and connect it to some other computer. Then copy everything you find in the hidden folder <drive>:\DE\Shares\etc. to somewhere else. This is your best path if you want to move away from Windows Home Server.

    Ah!  I have four SATA drives, and a two-bay SATA NAS enclosure.  If I take the drives out of WHS and stick them in the enclosure, it sounds like I can get where I want to get.  It occurs to me, though, given that I have four drives and folder-duplication "on" for all the shares, it may be a considerable amount of work to "unweave" all my files from 4 drives and get it stitched together into a single set of folders that have everything.  Is that a fair assumption?  Are there tools (eg, de-dup programs) that might help?

    I'm curious if you have any experience with (for example) SYNCHTOY, and using it (or similar) to slurp folders directly from WS03 D:\data to ...somewhere else.  It seems to me that it might do a better job of keeping folder structures/organization intact, if it works?

    B
    Tuesday, March 31, 2009 7:37 PM
  • An update:

    The "remove disk" process ended when the power went out this morning (72-ish hours).  It "appears" to have failed in a non-destructive way; by that I mean that my shares appear intact, and nothing appears to have been lost.  But, the root problem remains: I am getting "file conflicts", and I am not able to copy everything off my server (either to an external e-sata drive set up as a backup drive, or by copying folders from the shares onto a connected PC).  In both cases I get a message that there was a device error.

    The exact "file conflict" error I get is "There are file conflicts.  The request could not be performed because of an I/O device error".

    I have rebooted my server several times since Thursday, and the exact same pattern occurs each time:
    -- it boots "healthy" (all drive/hardware indicator lights "blue", network status is "healthy")
    -- after 10-15 minutes, network status turns to yellow, and I get an alert saying "you have file conflicts".  All indicators are still "blue".
       (if I look at "details" through the console, there will be 2-4 files listed as having "file conflicts" - it is always the same files)
    -- after another 10-15 minutes, I get another alert about file conflicts
       (if I look at "details" through the console, there will now be 12-15 files listed as having file conflicts)
    -- after another 10-15 minutes, I get an alert indicating that the "backup process has failed", and network status goes to "critical.
       (at that point, the "status" light on the front of the box goes red)

    It generally stabilizes about this point; however, there are a number of things worth (?) noting:
    -- all "health" indicators on the front of the box are still blue.  The box itself is not aware of any hardware issues.
    -- thru the WHS console, all drives continue to be listed as "healthy".  I have NEVER gotten an indication that one was faulting or needed "repair"

    ALL of this leads me to believe this is a set of server/software issues.  I have *never* gotten an indication that a drive is going bad, only that there are file conflicts (presumably between the two separate copies of files striped onto separate drives as a result of folder duplication).

    Edited to add: I forgot to mention, I did a "chkdsk D:/r" this morning.  no errors, no bad sectors, no repaired indexes, no recovered blocks.  I don't think it is a drive problem.

    You only checked one partition.  You need to run chkdsk /r on the other drives (found inside C:\fs).

    At this point, I am less concerned about replacing drives, and more concerned about how do I get my stuff *OFF* this box so that I have other options.  As mentioned above, if I try to backup folder-shares onto a directly-connected e-sata drive set up as a backup device, or copy them onto a usb-drive connected to LAN-connected PC, I get IO errors.  And not just on those 10-12 files - I get them on an alarming percentage of files, enough to really worry me that the WHS "folder duplication" construct is not going to end up protecting my files, it is going to end up making them unrecoverable.  I bought into the WHS concept in no small part because of the folder duplication and the implied "safety net" it represents.  I have steadily lost confidence in that safety net since Thursday, and now I pretty much want to get away of it as fast as I can.  I am *certain* that I could put my sata drives into a NAS enclosure, hang it off my network configured as a RAID-1 share, and NOT be having these problems.

    In that light, I'd welcome suggestions: In particular, any hints about how I might use some combination of directly-connected e-sata or usb drive and command-line (COPY? XCOPY?) on the underlying WS03 console itself to slurp stuff off my "D" drive.

    Thanks in advance.

    B

    Tuesday, March 31, 2009 7:48 PM
    Moderator
  • Regarding chkdsk, which kariya21 caught and I didn't:


    I'm not on the WHS team, I just post a lot. :)
    Tuesday, March 31, 2009 8:06 PM
    Moderator
  • The duplication feature does mirroring of the data onto all available drives thus allowing for a more robust recovery solution. There is no need to "unweave" your data off of four drives. It was designed as a easy to configure software raid substitute. As long as you have one good drive, you should be able to get everything duplicated back by following Ken's instructions. You may however lose any non-duplicated data that happened to be stored on the failed drive.
    Tuesday, March 31, 2009 8:30 PM
  • Regarding chkdsk, which kariya21 caught and I didn't:


    I'm not on the WHS team, I just post a lot. :)

    Ken, your hyperlink is pointing to the wrong location (you have whshardware in the address in addition to whsfaq).  Here is the correct link.
    Wednesday, April 1, 2009 1:04 AM
    Moderator
  • Ken, your hyperlink is pointing to the wrong location (you have whshardware in the address in addition to whsfaq).  Here is the correct link.
    I am not loving the new iteration of the forum software. I made that post in threaded view on the thread list page, where the link was right. Drilling down into the single thread, the link is broken. Fixing it in threaded view on the thread list page didn't work. Fixing it in the drill down view did. This behaviour can be observed in the quoted link in kariya21's post. Please don't fix it. :)

    <sigh> another bug for the forums people...

    I'm not on the WHS team, I just post a lot. :)
    Wednesday, April 1, 2009 2:12 AM
    Moderator
  • I am not loving the new iteration of the forum software. I made that post in threaded view on the thread list page, where the link was right. Drilling down into the single thread, the link is broken. Fixing it in threaded view on the thread list page didn't work. Fixing it in the drill down view did. This behaviour can be observed in the quoted link in kariya21's post. Please don't fix it. :)

    <sigh> another bug for the forums people...

    I'm not on the WHS team, I just post a lot. :)
    Oh ok, that makes more sense.  (I couldn't for the life of me figured out how you would have gotten the extra whshardware in your link.)  That may also explain why in one of Lara's posts today, she didn't create a hyperlink but instead just copied and pasted the website as text in her response.
    Wednesday, April 1, 2009 2:29 AM
    Moderator