locked
WHS Data Drive Bad Block Error RRS feed

  • Question

  • This morning I was installing the Windows updates on my WHS and afterwards started noticing some really sluggish access behavior. I had to reboot a couple of times to get going and even the boot BIOS check screens appeared very slowly, one-by-one. I suspected that I might have a drive/SATA problem and finally when after I could log on I looked at all the console diagnostics. All the drives report healthy, I can access shares, etc -- but -- there are tons of "disk" errors in the System Event log. Specifically this one:

    Source:Disk
    Type:Error
    Event ID:7
    Computer:WHS
    The device, \Device\Harddisk1, has a bad block.

    They recur in batches every few minutes. I looked at some SMART data for all the drives (there are 5 total, including system disk) and one of the drives shows a critical "spinup time" value (SMART item No 3) and a comment that says "Pre-Failure: Imminent loss of data is being predicted". I have a lot of similar model/sized drives model in the system so it's not exactly obvious exactly which one is "\Device\Harddisk1" and the SMART data doesn't ID them by physical location either. There are two identical models of this type drive in the system. Is there any way to map these for sure to the physical device so that I know what the OS thinks Harddisk1 is? I'm using the "Disk Manangment Add in" which lists this same model disk as Disk1 in it's list (It's a Samsung 1.5 F2EG, and the SYS disk is Disk0) but I'd like to be sure.

    Second question, what exactly are the steps to repair/replace in this case? I've got identical sized HDD spares but I would imagine trying to "remove" this disk through the WHS console will be a problem if it constantly generates bad block errors already. If I just pull it and replace it offline won't that mess up where WHS thinks all the files reside on that volume? Not sure how to proceed at this point. I do have backups of everything important on the WHS. I'm not going to lose anything critical but I'd like to minimize it. Is there any way to recover from a bad bad? I can do this on regular Windows systems by running a chkdsk /f/r on the volume, but with WHS? Is it really this difficult to recover from a WHS error or am I just not aware of some procedures to deal with it.

    Thanks,

    Thursday, January 14, 2010 5:32 PM

Answers

  • If the disk is failing, you should run chkdsk on it prior to the removal. If you're not sure which disk is which, you can run chkdsk on all your drives. This will often correct file system issues well enough to let you remove the disk in the console.

    If you can't remove the disk in the console, then you can safely power down your server and disconnect it. When you reboot your server a disk will be shown as "Missing". If you don't know which of several disks it actually is, it's safe to power your server down, disconnect one, and power it back up. If you get the wrong disk, jsut try again; no harm should come to your files.

    Assuming you've had to physically remove the disk without removing it from the storage pool first, you can remove it "after the fact". You may be warned that you'll lose files (if you don't have duplication on for all shares now) and/or backups; accept these warnings and proceed. After the disk has been removed from the pool, connect it to some computer on your network and copy from <driveletter>:\DE\Shares.etc. any files that were in shares that didn't have duplication on. You may not be able to recover all files, due to the bad block error; unfortunately if you choose not to turn duplicaiton on for one or more shares this is a risk you choose to take.

    If you were warned that you might lose backups, you should use the Repair feature in the console to repair the backup database after removing the disk. You may lose some or all of your client computer backups when you do this, because A) the backup database is not duplicated, and B) the storage technology it uses eliminates all redundancy in the database, so if there is damage to the database, the damaged areas have to be removed.


    I'm not on the WHS team, I just post a lot. :)
    • Marked as answer by glorp Friday, January 15, 2010 4:43 PM
    Thursday, January 14, 2010 5:46 PM
    Moderator

All replies

  • If the disk is failing, you should run chkdsk on it prior to the removal. If you're not sure which disk is which, you can run chkdsk on all your drives. This will often correct file system issues well enough to let you remove the disk in the console.

    If you can't remove the disk in the console, then you can safely power down your server and disconnect it. When you reboot your server a disk will be shown as "Missing". If you don't know which of several disks it actually is, it's safe to power your server down, disconnect one, and power it back up. If you get the wrong disk, jsut try again; no harm should come to your files.

    Assuming you've had to physically remove the disk without removing it from the storage pool first, you can remove it "after the fact". You may be warned that you'll lose files (if you don't have duplication on for all shares now) and/or backups; accept these warnings and proceed. After the disk has been removed from the pool, connect it to some computer on your network and copy from <driveletter>:\DE\Shares.etc. any files that were in shares that didn't have duplication on. You may not be able to recover all files, due to the bad block error; unfortunately if you choose not to turn duplicaiton on for one or more shares this is a risk you choose to take.

    If you were warned that you might lose backups, you should use the Repair feature in the console to repair the backup database after removing the disk. You may lose some or all of your client computer backups when you do this, because A) the backup database is not duplicated, and B) the storage technology it uses eliminates all redundancy in the database, so if there is damage to the database, the damaged areas have to be removed.


    I'm not on the WHS team, I just post a lot. :)
    • Marked as answer by glorp Friday, January 15, 2010 4:43 PM
    Thursday, January 14, 2010 5:46 PM
    Moderator
  • Thany you for the help (again) Ken. My WHS is a home brew. I have full desktop access when I need it.

    I've pretty well determined exactly what hdd it is by using HDTune and being able to see SMART data + serial numbers from that tool. I'm running your chkdsk batch now on all the drives just to be safe. It may well take a day (5 x 1.5TB drives) but please leave my thread open and/or unanswered for a couple days and I'll report back with questions or further problems.

    Appreciate the help very much.
    Thursday, January 14, 2010 6:03 PM
  • Yes, if you hit further issues, please do get back to us.
    I'm not on the WHS team, I just post a lot. :)
    Thursday, January 14, 2010 6:17 PM
    Moderator
  • Ken,

    In your chkdsk batch, is C: the system partition, D: the pool partition on the sys drive, and then the "for" loop on C:\FS each of the other non-sys drives in the pool? Probably is a FAQ somewhere that details that but I'm not sure what it's physically checking as I watch the command window report results.

    Thursday, January 14, 2010 8:05 PM
  • Yes, that's how the drives are laid out. But don't watch the command window. It's boring most of the time and you'll miss all the excitement when you take a bio-break. Instead, after everything is done (and you've rebooted to allow chkdsk to run on C:, which is almost certainly going to be required) open up your Application event log and look for a set of events from the source winlogon. That's how a chkdsk run identifies itself, and you'll get pretty much the same information minus the hundreds of lines of "Nothing yet, Boss!". :)
    I'm not on the WHS team, I just post a lot. :)
    Thursday, January 14, 2010 8:32 PM
    Moderator
  • I think I may need to terminate the holistic chkdsk and just run it on the problem drive. It's been running for almost 4 hours and is ~33% done on just partition d:. Four more identical-sized drives after it finishes that and d: will take ~12-16 hours. Can you tell me what the command is to do that? I'm a little unclear how your batch expands in the -for-do loop.

    I looked in Explorer on the WHS and see the C:\FS folder. Under it are 4 subdirectories ("H", "K", "N", & "Q"). I assume these are mount points for the 4 other pool drives. By poking around a little I can tell that things "hang" and I see new system event log 'bad block' messages when I access files in one of those 4 directories ("K") so I think I have my culprit.

    How do I issue chkdsk on that one pool drive? Is it as simple as:
    chkdsk c:\fs\k /x /r
    ?
     Any way to be exactly sure how the 4 letters map to the physical drives?
    Thursday, January 14, 2010 9:58 PM
  • Just let it go. It will run your four additional drives in parallel. It'll take a while, but not 4x as long as D:. Note: the ability to check many terabytes of disk storage for errors in just a couple of days would have amazed me a only few years ago. You're spoiled. :)

    Regarding mapping drives to mount points: you can map a drive in the Disk Management MMC snap-in (all the information is available and easy to get to), but mapping a drive in the Windows Home Server console isn't so easy.

    I'm not on the WHS team, I just post a lot. :)
    Thursday, January 14, 2010 10:30 PM
    Moderator
  • Unfortunately the drive died while (or before) it could do a full chkdsk. WHS hung after it got past the D: and C: partitions of your script. I finally rebooted after trying overnight with no progress on that drive and tons and tons of disk errors in the event log. After a reboot the drive was completely missing in console. I went ahead and pulled the drive and removed it in console. I'm going to put the drive in a docking station on another computer and see if I can access anything on it at all. Prolly not.

    Thanks for the help Ken.

    Last question. I lost some files in the WHS Software share, specifically some of the connector software installation programs. Is there a way to recreate those? I have the .iso version on CD. Can I just copy all the files from the iso's install on a client back to WHS' \software\Home Server Connector Software folder?
    Friday, January 15, 2010 4:45 PM
  • This FAQ will walk you through the recreation of the connector software folder on your server, for a generic WHS installation. For an OEM unit, check with the manufacturer, but probably the only supported option is going to be a server recovery.
    I'm not on the WHS team, I just post a lot. :)
    Friday, January 15, 2010 5:02 PM
    Moderator
  • If the disk is failing, you should run chkdsk on it prior to the removal. If you're not sure which disk is which, you can run chkdsk on all your drives . This will often correct file system issues well enough to let you remove the disk in the console.


    Thanks for the info but I ran this batch file to chkdsk on all drives. My WHS has 6 drives. Running said drive C would scan at reboot and then 5 command prompt windows popped up and seemed to be working then nothing on stage 5 for an hour. From what I can read it should show a % and on WHS I don't hear 5 drives pounding away, any clues. Here is what is in one windows and is simular except for numbers in the others.

    I am only getting the one or two bad block events daily on drive 1 but to be safe it will be removed once I get chkdisk to verify unless anyone things that is nothing to worry about. Thanks

    The type of the file system is NTFS.
    Volume dismounted.  All opened handles to this volume are now invalid.
    Volume label is DATA.

    CHKDSK is verifying files (stage 1 of 5)...
    6544 file records processed.
    File verification completed.
    115 large file records processed.
    0 bad file records processed.
    0 EA records processed.
    0 reparse records processed.
    CHKDSK is verifying indexes (stage 2 of 5)...
    25131 index entries processed.
    Index verification completed.
    5 unindexed files processed.
    CHKDSK is verifying security descriptors (stage 3 of 5)...
    6544 security descriptors processed.
    Security descriptor verification completed.
    808 data files processed.
    CHKDSK is verifying file data (stage 4 of 5)...

    Monday, February 15, 2010 8:23 PM
  • It can take a very long time to run, up to 18 hours.  Stages 1 - 3 go very quickly.  Stages 4 and 5 may seem to stall, but let it keep running.  The percentages given in stages 4 and 5 are not very accurate.  I had one drive sit on 10% for hours.  I let it continue running overnight and it completed.

    --
    ______________
    BullDawg
    Associate Expert
    In God We Trust
    ______________
     
    "agent86oz" <=?utf-8?B?YWdlbnQ4Nm96?=> wrote in message news:a758e6a6-9bfc-402 4-adae-97f8ade8008f...
    If the disk is failing, you should run chkdsk on it prior to the removal. If you're not sure which disk is which, you can run chkdsk on all your drives . This will often correct file system issues well enough to let you remove the disk in the console.


    Thanks for the info but I ran this batch file to chkdsk on all drives. My WHS has 6 drives. Running said drive C would scan at reboot and then 5 command prompt windows popped up and seemed to be working then nothing on stage 5 for an hour. From what I can read it should show a % and on WHS I don't hear 5 drives pounding away, any clues. Here is what is in one windows and is simular except for numbers in the others.

    I am only getting the one or two bad block events daily on drive 1 but to be safe it will be removed once I get chkdisk to verify unless anyone things that is nothing to worry about. Thanks

    The type of the file system is NTFS.
    Volume dismounted.  All opened handles to this volume are now invalid.
    Volume label is DATA.

    CHKDSK is verifying files (stage 1 of 5)...
    6544 file records processed.
    File verification completed.
    115 large file records processed.
    0 bad file records processed.
    0 EA records processed.
    0 reparse records processed.
    CHKDSK is verifying indexes (stage 2 of 5)...
    25131 index entries processed.
    Index verification completed.
    5 unindexed files processed.
    CHKDSK is verifying security descriptors (stage 3 of 5)...
    6544 security descriptors processed.
    Security descriptor verification completed.
    808 data files processed.
    CHKDSK is verifying file data (stage 4 of 5)...


    BullDawg
    Monday, February 15, 2010 9:55 PM