locked
Identifying a bad disk RRS feed

  • Question

  • Hi,
      I have horrible luck with disk drives. They drop like flies around me. I just build a WHS, for example, with 3 old IDE drives and one nice new SATA drive.

      I was able to copy over about 400 GB of data. I have folder duplication turned on and I have about 1.5 TB of free space right now. And I have file conflicts: "Data error (cyclic redundancy check)". Indeed, when I try to copy those files off the WHS I always fail. No idea if those files had been duplicated already, so the data may well be gone (I'll live).

      But more important is why this happened. Server storage shows everything healthy. But when I look at the event viewer (when I connect with terminal server) I see the system log filled with "Error" from the disk sub-system - "The \Device\Harddisk0 has a bad block." Ok -- fine. I would believe the OS that the disk has a bad block.

      So, I want to remove that disk from WHS's storage and (have it recover everything it can) and then eventually remove it from the system. But how do I identify what hardware is what? I have 3 identical drives installed (they are IDE disks) and a 4th on SATA. It is one of the first three I suspect is going south. How can I get from \Device\Harddisk0 to a mount point, where perhaps I could run "repair", and then from there to which one of the identical rows in the connector's "Server Storage" from which I can click "Remove" to remove it? Many thanks!

      Cheers,
        Gordon.

    Gordon
    Tuesday, November 11, 2008 6:35 PM

Answers

  • Gordon,

    It would be worthwhile running chkdsk /r on each of these drives. To do so, you can reach the individual mount points through a Remote session. Open Explorer, and then select each drive listed under the C:/fs mount point; 'right-click' on the drive, select Properties, then on the General page, again select the Properties button. On this second Properties, select Tools and enable both the check boxes. This can be repeated for each of the drives listed and will then require a server re-boot to run through.

    To actually identify an individual drive, the Disk Management Add-In can help. This will list each drives details, including it's serial numbers etc., so you can then match to a physical drive. Alternatively, detach a drive and reboot, you will get error messages, and should be able to link that missing drive to what's listed (or now, not listed). 
    To actually remove the drive, use the drive removal option in the Console, this will try to copy off any user data to available free space, so you may need to add another drive, if you're short of space. After it's run it's course, you can then just unplug the drive and replace, allowing WHS if necessary to move data round.

    Colin






    If anyone answers your query successfully, please mark it as 'Helpful', to guide other users.
    • Marked as answer by GordonTWatts Tuesday, November 11, 2008 7:14 PM
    Tuesday, November 11, 2008 6:55 PM
    Moderator

All replies

  • Hi,
      Ok -- digging a little deeper... :)

      The \device\harddisk0 matches the drive "#" that you find in the disk management software (Computer -> right click -> Manage -> Disk Management". I can then right click on the disk area of the Disk 0's row and click "modify drive letter and mount points" to see that it is mounted in c:\fs\C. (confusing that it is also called "C").

      Now -- how does the order I see in the disk management console match up to the order I see in the WHS console? At this point I could ust start disabling disks and see when the disk in the explorer empties out... Which is how I will track this down now. :(

      Cheers,
        Gordon.

    Gordon
    Tuesday, November 11, 2008 6:49 PM
  • Gordon,

    It would be worthwhile running chkdsk /r on each of these drives. To do so, you can reach the individual mount points through a Remote session. Open Explorer, and then select each drive listed under the C:/fs mount point; 'right-click' on the drive, select Properties, then on the General page, again select the Properties button. On this second Properties, select Tools and enable both the check boxes. This can be repeated for each of the drives listed and will then require a server re-boot to run through.

    To actually identify an individual drive, the Disk Management Add-In can help. This will list each drives details, including it's serial numbers etc., so you can then match to a physical drive. Alternatively, detach a drive and reboot, you will get error messages, and should be able to link that missing drive to what's listed (or now, not listed). 
    To actually remove the drive, use the drive removal option in the Console, this will try to copy off any user data to available free space, so you may need to add another drive, if you're short of space. After it's run it's course, you can then just unplug the drive and replace, allowing WHS if necessary to move data round.

    Colin






    If anyone answers your query successfully, please mark it as 'Helpful', to guide other users.
    • Marked as answer by GordonTWatts Tuesday, November 11, 2008 7:14 PM
    Tuesday, November 11, 2008 6:55 PM
    Moderator
  • Hi Colin,
      Thanks for the information. You have an excellent point -- it is harmless to run chkdisk on a healthy drive. So I'll do that and then look at the event logs after a reboot.

      BTW - something interesting - some of the drives require me to do a reboot before doing the chkdisk and others do not. Odd.

      And finally - you mentioned serial numbers. Where do you get those from? I have drive model name when I right click and select "Properties", but sadly I have three identical drives installed in the system.

      UPDATE: and thanks for the last bit - I wasn't sure how robust WHS was to removing drives and putting them back in and rebooting; was worried some data might be lost! But it sounds like as long as I don't get too crazy and unplug just one drive at a time I'll be able to narrow it down without losing data.

      Cheers,
        Gordon.

    Gordon
    Tuesday, November 11, 2008 7:00 PM
  • Gordon,
    In the Disk Management Add-In, highlight the drive and then either 'right-click' and select 'details', or click 'details' on the menu. The resulting pop-up should list all the information available for that drive and each of my servers here show it's GUID and Serial Number etc.

    Unplugging the drive and re-starting the server shouldn't do any damage to data. Just so long as you don't leave it in that state for any length of time, other wise any operation done by the server, could be detremental. (I did just this on a server, to identify all the drives, and it had no ill-effects).

    Colin




    If anyone answers your query successfully, please mark it as 'Helpful', to guide other users.
    Tuesday, November 11, 2008 7:12 PM
    Moderator
  • Hi Colin,
      Wow. That was very brave of you!

      I'll do that - thanks a lot! And I'll report back if I have further questions. Checking 2 TB of disk space takes a while...

        Cheers,
          Gordon.

    Gordon
    Tuesday, November 11, 2008 7:14 PM
  • Hi Colin,
      Ok -- this is very odd! I never have a menu with "details" selected. Here is exactly what I do:

    1) From start menu right click on computer and select "manage"
    2) Click on "disk management". This gives me, in the right hand pane, two different lists. The top list are the logical drives (most of them called "DATA" and one called "SYS"), and in the lower half are the hard drives in the left column ("Disk 0", "Disk 1", etc.) and in the right hand column the logical drives on each actual hard drive.
    3) Click on "Disk 0" in the lower half, on the left most column (i.e. Right where it says "Disk 0"). I'm given only "Convert to Dynamic disk" and "Properties" and "Help" as options. The Properties dialog box gives me the model name (Maxto 4A300J0), device type ("Disk Drives", Manufacturer ("(Standard disk drives)"), and Location ("Location 0(0)"). I'm pretty sure this is where you are seeing the serial number and GUID, right?
    4) Right Click on the logical disk portion of the lower half and there you are given "Properties" along with "Mark partition as active", etc. Properties from there seems to be even further away.

      Perhaps my mobo isn't up to the task? Or is there a "Details" menu item I've missed? Below is the text found in the event log assocated with the chkdisk. This is the GUID you are refering to I assume - but how can I make the match?

    \\?\Volume{434cd624-aeca-11dd-b9ab-eb524cca701c


      I ran a disk check at reboot and it found too many bad sectors -- I've got to get the disk out:

      Many thanks!

      Cheers,
        Gordon.

    Gordon
    Wednesday, November 12, 2008 12:31 AM
  • Disk Management is an MC snap-in. The (confusingly named, in this case :) ) Disk Management add-in installs into the WHS console, and gives a more detailed view of the server storage than the  console itself does.
    I'm not on the WHS team, I just post a lot. :)
    Wednesday, November 12, 2008 2:06 AM
    Moderator