locked
Disk error and chkdsk execution time RRS feed

  • Question

  • I've got an HP EX470 with a 500gb system drive and Two 1TB drives, and a USB enclosure which contains three 1.5tb drives, and one 2tb drive.

    I noticed yesterday that I was unable to read a couple of files from one of my storage pool drives, looks like drive 3 which I believe is the first 1.5tb drive in the USB enclosure.  I checked the Event log, and noticed that disk 3 was reporting a bad block.

    Tried deleting the files and got a Cyclic Redundancy Error not allowing me to delete them.  Remotely logged into the WHS, and tried deleting them from a DOS prompt, that failed aswell with a cyclic redundancy check.

    Next I created the check disk batch file on my desktop and executed that:

    net stop pdl
    net stop whsbackup
    chkdsk D: /x /r 
    chkdsk C: /x /r
    for /d %%1 in (C:\fs\*) do start chkdsk /x /r %%1

    It took about 5 hours to complete the chkdsk on the first two 1tb drives within the WHS, the remaining four drives in the external enclosure are still running 12 hours later.  In one of the 4 remaining windows I noticed that chkdsk identified the 2 corrupt files, so at least it knows about them.

    Few questions... 

    1. How long should it take to run chkdsk on those 4 drives... They are all on Step 4 of 5 which I understand to be the longest step.  I'm guessing due to the USB interface it's moving at a fraction of the speed of the first 2 drives which are connected via SATA.  Could I be looking at 40 hours or more, (there are about 5tb of data on those 4 drives) any guesstimates?

    2. I selected yes to the prompt to scan C: at next reboot...  this is a headless server... did I make a mistake here?  I don't suspect that there are any problems with C: so I assume that it will just finish rebooting after the chkdsk.  But given that I won't be able to see the progress... I guess this could take a while, luckily the system partition is pretty small.

    3. Does a bad block indicate that I should give up on the drive and remove it once the chkdsk is complete?  Or will chkdsk mark that block as bad and I should be able to use the rest of the drive?  Drive reports as healthy in the WHS console.  Unfortunately with USB connected drives I can't obtain the SMART information.

    4. If I remove the drive from the server is it worth placing the drive in an external enclosure and performing a full format (which I assume would mark the bad blocks) and using it for backup, or is the drive doomed?

    Monday, February 7, 2011 2:23 PM

Answers

  • Ok well chkdsk finally finished and I am able to answer some of my own questions, for those interested...

     

    1

    The first two 1tb drives which are inside the EX470 finished fairly quickly, I think it was less than 12 hours.  The drives in the external enclosure took quite a bit longer, one of the 1.5tb drives, and the 2tb drive which had about 1 terrabyte full each, took approximately 30 hours. 

    The remaining two 1.5tb drives which were completely full, completed at about the 45 hour, and 47 hour marks, the longest one is the one that reported corrupt files, and was responsible for disk errors showing in Event Viewer. 

    One thing I learned is that you can monitor the progress of each chkdsk by opening task manager, enabling the column IO Read Bytes, and watch that count upwards.  Eventually it will reach the number of bytes that the drive holds, and finish.  From this you may be able to estimate how long it will take to complete.  (this should actually be a tip somewhere)

     

    2

    I powered off the server after the chkdsk was complete and left it off overnight...  rebooted it this afternoon, and it seems like this went OK... I had full access to my WHS shortly after power up.

     

    3 & 4

    Even though the event viewer was reporting bad blocks I didn't see any bad sectors while chkdsk was running...  chkdsk had reported a few files that had errors that it repaired. 

    Once chkdsk was complete I actually located these files and deleted them, and replaced them with copies from my backup, just in case.  (I don't actually have duplication turned on, I keep an offline backup of everything on my server)

    As to whether the drive that was reporting errors is still good or not remains to be seen.  It's currently reporting healthy, but I'll have to check the event log every day and look for disk errors on drive 3 for the next few days.  If I start seeing drive errors again on that drive, I'm convinced that I will have to remove it from the server.

    • Marked as answer by activoice Wednesday, February 9, 2011 11:23 PM
    Wednesday, February 9, 2011 11:23 PM

All replies

  • Ok well chkdsk finally finished and I am able to answer some of my own questions, for those interested...

     

    1

    The first two 1tb drives which are inside the EX470 finished fairly quickly, I think it was less than 12 hours.  The drives in the external enclosure took quite a bit longer, one of the 1.5tb drives, and the 2tb drive which had about 1 terrabyte full each, took approximately 30 hours. 

    The remaining two 1.5tb drives which were completely full, completed at about the 45 hour, and 47 hour marks, the longest one is the one that reported corrupt files, and was responsible for disk errors showing in Event Viewer. 

    One thing I learned is that you can monitor the progress of each chkdsk by opening task manager, enabling the column IO Read Bytes, and watch that count upwards.  Eventually it will reach the number of bytes that the drive holds, and finish.  From this you may be able to estimate how long it will take to complete.  (this should actually be a tip somewhere)

     

    2

    I powered off the server after the chkdsk was complete and left it off overnight...  rebooted it this afternoon, and it seems like this went OK... I had full access to my WHS shortly after power up.

     

    3 & 4

    Even though the event viewer was reporting bad blocks I didn't see any bad sectors while chkdsk was running...  chkdsk had reported a few files that had errors that it repaired. 

    Once chkdsk was complete I actually located these files and deleted them, and replaced them with copies from my backup, just in case.  (I don't actually have duplication turned on, I keep an offline backup of everything on my server)

    As to whether the drive that was reporting errors is still good or not remains to be seen.  It's currently reporting healthy, but I'll have to check the event log every day and look for disk errors on drive 3 for the next few days.  If I start seeing drive errors again on that drive, I'm convinced that I will have to remove it from the server.

    • Marked as answer by activoice Wednesday, February 9, 2011 11:23 PM
    Wednesday, February 9, 2011 11:23 PM
  • Hello,

    Thank you for the useful information! especially the one about task manager!

    I was starting to get antsy about just how long the massive chkdsk operation is going to take. I have 4 drives connected to the Intel ICH 9 controller on my P35 based board (3 x 1TBs and 1 2TB) and they have all completed chkdsk.

    However, my larger drives which are running on a Promise FastTrak TX4650 based card ( 2 x 1.5TB, 1 x 2TB and 1 x 1TB (the one reporting errors)) are taking a MILLION years (well, ok 8 hours later) to complete chkdsk.

    My 1TB drive which was reporting errors has some bad clusters which appear to be getting "fixed" but overall the process seems like it will never finish.

    Now that I read this post, I might be able to extrapolate just how much longer the wait will be before I can see what is next.

    Handy tip for Microsoft or some savvy programmers.. how about a Chkdsk Microscope Tool? something that lets you "see" what chkdsk is working on, and what it is encountering. I for one would LOVE to have this, especially on a server based OS.

    Thanks again Activoice, I really appreciate this!

    Alex

    Tuesday, February 22, 2011 12:30 AM
  • p.p1 {margin: 0.0px 0.0px 10.0px 0.0px; font: 11.0px Verdana}

    I've got an Acer EasyStore H431, with the OS on the original 1Tb drive, and 3 WD 2Tb drives filling the remaining slots. Recently I started receiving warnings regarding the health of one of the 2Tb drives, so I bought a new one and told the server to remove the failing one (after Repair Drive failed to improve things). I started the operation last Saturday morning, and it is now the following Thursday evening. From the progress bar, it is about 80% of the way through.

    If this is what Drive Extender does for us, then I would much rather have gone down the RAID 5 route and have the drive fail hard on me, and spend a fraction of the time rebuilding the array with a new drive.

    Thursday, February 24, 2011 10:30 AM