locked
unable to remove drive through console, super slow drive performance, high cpu unexplained by task manager, console becomes unavailable

    Question

  • I am having severe problems with hard drive speed, and cpu useage in WHS PP3. This is a little bit long and detailed, sorry, but I think I need to be verbose to describe this problem.

    Things worked fine until sometime yesterday, and then the problems began.  The beginning of the problems did not co-incide with any changes to WHS software. I do have add-ons such as disk management, but the WHS worked fine for some time, before this problem somehow just appeared.

    Here is when I first noticed problems.

    I began to transfer a large 10GB movie file from my PC to my WHS.  The process was moving along at a brutually slow 1.9Mbps.

    I tried turning off file duplication on my WHS movies folder to try and speed things up, but this did not help. I turned file duplication back on.

    I physically logged into the WHS (have attached keyboard and monitor) to check the task manager process and performance tabs. Very odd. CPU useage was 50%+, the second CPU core on my AMD 5050e was pretty much pinned to 50% the entire time.  Yet in the processes tab, even with "show processes from all users" checked, system idle process typically remained in the high 90% range, and individual processes did not come even close to 50%.  At momentary intervals, I would see demigrator hit 30 to 40%, but only for very brief moments - it was typically in the single digits percentage useage.

    I do have a WD hard drive that I want to remove from the storage pool.  The WD drive is an RMA replacement recently sent to me by WD, to replace a drive that became unreadable.  Unfortunately, the replacement drive has an issue of its own - 19ms access times, while it's identical twin in my system has 12ms access times.  So WD is sending out yet another replacement for the defective replacement.

    The replacement drive had been in my system for over 3 weeks with no problems.  The slow speed was affecting ALL drives on my WHS.  Still, wondering if that particular drive had somehow started experiencing further problems that affected the rest of the system, I shut down, physically disconnected the drive and restarted WHS.  The symptoms persisted.  With or without that drive in the storage pool, the above issues persisted - slow access to all hard drive, unexplained high CPU useage. I shut down and reconnected the WD drive.  When the WD drive was disconnected, I was warned my PC backup database has errors, so this drive likely contains at least part of my PC backups.

    I began a remove drive procedure from within the disk manager add-on tab, so I could observe drive performance.  I could see a transfer from the WD drive to another drive happening, but it was again at a brutally slow average of about 1500 Mbps. These drives would normally average about 50,000 Mbps, especially on internal transfers. With 95% of 500GB to transfer this would clearly take WAAAY to long.

    A possibly separate but related issue, I went back to my client PC, so I could observe the process from there (more convenient), yet I could NOT get the console started on the client PC. It would give the backup service is down message (as expected), but would then refuse to launch the console.  Worse yet, when I went back to the WHS PC, on which I was still logged in, the WHS console was no longer running on the WHS either.  I could not re-launch the console from the WHS desktop, nor from a client PC.

    I had no choice but to restart the WHS.  Immediately after restart, I logged into the WHS desktop, and ran HD tune on the WD drive, and other storage drives.  HDtune had no problem getting transfer rates of 50Mbps + from the drives.  WHS used to get that sort of performance from the drives, but it was currently getting only 1.5MBps.  I restarted the console, re-initiated the WHS drive removal procedure, and went to bed.  When I woke up, the WHS console was no longer running on the WHS desktop - odd. Once again trying to relaunch the console on the WHS desktop, or from a client PC, failed.

    I'm not sure how to best submit this bug to connect, since the cab file capture is typically done through the console, yet a major part of the problem is the console failing to launch.

    Phew - that's it for now - thanks for reading....
    • Edited by dpkform Thursday, August 27, 2009 5:43 PM this forum desperately needs a "preview" before submit option for edits :-)
    Thursday, August 27, 2009 5:16 PM

Answers

  • I have had a similar problem in the past where one of my drives was dropping into PIO mode - which really slowed things down.
    In my case I think it was associated with the fact that I spin the drives down after they aren't used for a while and WHS was recording an error if they took too long to spin up.
    If this is the case you may find this link helpful - see http://winhlp.com/node/10 - I haven't had any problems since following the advice on that page.
    It could of course be realted to a failing disk?
    • Marked as answer by dpkform Friday, August 28, 2009 9:10 PM
    Friday, August 28, 2009 3:06 PM
  • I have had a similar problem in the past where one of my drives was dropping into PIO mode - which really slowed things down.
    In my case I think it was associated with the fact that I spin the drives down after they aren't used for a while and WHS was recording an error if they took too long to spin up.
    If this is the case you may find this link helpful - see http://winhlp.com/node/10 - I haven't had any problems since following the advice on that page.
    It could of course be realted to a failing disk?

    Thanks Lant76 and Ken for your help.

    Checking device manager, I noticed one of my two primary IDE channels was operating in pio mode. I deleted the checksum values as per the linked instructions, and things seem normal again.  I have once again started the process of removing the WD data drive, and the Disk Management add-in shows disk transfer rates of 40 to 55 MBps, just like they should be.

    I will keep an eye on things to see if any hardware issues creep back up.

    Thanks again, Ken and Lant76.
    • Marked as answer by dpkform Friday, August 28, 2009 9:16 PM
    Friday, August 28, 2009 9:14 PM

All replies

  • another update

    I have begun using the BDBB add-in to create a backup of my current pc backups, to a network folder share on my client pc.

    That way I should be able to preserve my most recent PC backups, even after removing the drive I am going to RMA again, for slow access times.

    This operation appears to be going normally, with HDD transfer rates of 30 to 50 MBps, according to the disk management add-in.

    Also, my CPU useage is more normal running between 8% and 25%.  It is still showing a total slightly higher than explained by individual processes in the process tab of task manager, but the second CPU core is no longer pegged at 50% useage.  System also seems more repsonsive.

    I'm assuming demigrator essentially shuts down during a BDBB backup? Could the problems I am having be related to a problem with DEmigrator?  How would I further investigate?
    Thursday, August 27, 2009 7:00 PM
  • Yes, BDBB shuts down Drive Extender.

    What you describe sounds like you have one or more failing hard drives, or a failing hard disk controller (or a bad cable...). You may want to run chkdsk on all the drives in your server.

    I'm not on the WHS team, I just post a lot. :)
    Thursday, August 27, 2009 8:04 PM
    Moderator
  • Sorry for the formatting. For some reason line feeds are not working in this post. Thanks Ken, for the tip and the "how to" link. I will try checkdsk when my bdbb backup completes. Oddly, I seem to be back to task manager performance tab reporting an average of over 50% CPU useage, but this time with the second core once again pegged at 50%, but with numerous closely spaced spikes down to about 30% of the second core. Firt core is constant right now about 15 and very constant. Here is a processor useage history for the last approx 5 hours, from the process tab of WHS task manager. Again, very odd that the peformance tab CPU graph shows that the processor is much busier than the processes tab indicates: Image Name user name cpu time system idle process system 4:36:15 Home Server Console administrator 0:11:18 demigrator.exe system 0:06:22 searchindexer system 0:05:31 system (?) system 0:03:14 taskmgr administrator 0:00:40 qsm.exe system 0:00:36 svchost system 0:00:36 cqvsvc system 0:00:23
    • Edited by dpkform Thursday, August 27, 2009 10:46 PM formattting
    Thursday, August 27, 2009 10:44 PM
  • hope this formatting works..... here are the check disk results from the d "drive".... do not seem to be any signifcant problems so far.....?

    The other 4 spawned dos windows are still processing checkdsk - I will post those results when they are complete.

    C:\Documents and Settings\Administrator\Desktop>chkdsk D: /x /r
    The type of the file system is NTFS.
    Volume dismounted.  All opened handles to this volume are now invalid.
    Volume label is DATA.

    CHKDSK is verifying files (stage 1 of 5)...
    77328 file records processed.
    File verification completed.
    17 large file records processed.
    0 bad file records processed.
    0 EA records processed.
    72569 reparse records processed.
    CHKDSK is verifying indexes (stage 2 of 5)...
    285844 index entries processed.
    Index verification completed.
    5 unindexed files processed.
    CHKDSK is verifying security descriptors (stage 3 of 5)...
    77328 security descriptors processed.
    Security descriptor verification completed.
    4681 data files processed.
    CHKDSK is verifying Usn Journal...
    36789208 USN bytes processed.
    Usn Journal verification completed.
    CHKDSK is verifying file data (stage 4 of 5)...
    77312 files processed.
    File data verification completed.
    CHKDSK is verifying free space (stage 5 of 5)...
    72553644 free clusters processed.
    Free space verification is complete.
    Correcting errors in the uppercase file.
    Windows has made corrections to the file system.

     291595814 KB total disk space.
       1157816 KB in 72604 files.
         34856 KB in 4682 indexes.
             0 KB in bad sectors.
        188566 KB in use by the system.
         65536 KB occupied by the log file.
     290214576 KB available on disk.

          4096 bytes in each allocation unit.
      72898953 total allocation units on disk.
      72553644 allocation units available on disk.

    C:\Documents and Settings\Administrator\Desktop>chkdsk C: /x /r
    The type of the file system is NTFS.
    Cannot lock current drive.

    Chkdsk cannot run because the volume is in use by another
    process.  Would you like to schedule this volume to be
    checked the next time the system restarts? (Y/N) y

    This volume will be checked the next time the system restarts.

    C:\Documents and Settings\Administrator\Desktop>for / %1 in (C:\fs\*) do start chkdsk /x /r %1

    C:\Documents and Settings\Administrator\Desktop>start chkdsk /x /r C:\fs\F

    C:\Documents and Settings\Administrator\Desktop>start chkdsk /x /r C:\fs\V

    C:\Documents and Settings\Administrator\Desktop>start chkdsk /x /r C:\fs\X

    C:\Documents and Settings\Administrator\Desktop>start chkdsk /x /r C:\fs\Z

    C:\Documents and Settings\Administrator\Desktop>

    Friday, August 28, 2009 8:41 AM
  • I have had a similar problem in the past where one of my drives was dropping into PIO mode - which really slowed things down.
    In my case I think it was associated with the fact that I spin the drives down after they aren't used for a while and WHS was recording an error if they took too long to spin up.
    If this is the case you may find this link helpful - see http://winhlp.com/node/10 - I haven't had any problems since following the advice on that page.
    It could of course be realted to a failing disk?
    • Marked as answer by dpkform Friday, August 28, 2009 9:10 PM
    Friday, August 28, 2009 3:06 PM
  • Thanks Lant for the tip - I will check it out

    Ken...

    I have been running checkdsk as per your script for almost 16 hours.  I don't think I see any problems mentioned, but checkdsk seems mostly hung.

    The script your provided spawned 4 additional command windows.  I have 4 data drives, 1 system drive.  Only 1 of those spawned command windows still seems to be executing slowly.  The still running command windows is 35% complete stage 5 of 5, and still incrementing the free clusters processed count.

    The other 3 windows seem stuck at stage 4 of 5, at 16, 11, and 10 percent complete. Their stage 4 file data counters no longer appear to be incrementing.

    FWIW, the s.m.a.r.t. utility reports no problems with any of the disks.

    I have double checked that all cables seem snug.  If a cable/connection/sata port has gone bad, I'm not sure the best method to try isolating the issue.  I have tried isolating the RMA replaced drive, and it made no difference.  I guess I can try the other drives one by one, and then observe performance?

    Do you think I should abort the script?  I am thinking to try Lant76's suggestion, and failing that, attempting a server reinstallation.  What do you think?

    Should I file a bug report with MS connect?

    In case a bad disk is suspect because I mentioned the RMA replaced disk which has a 19ms access time, I don't think that disk has any other issues. I have been using it for 3 weeks just fine.  Interestingly, I just received a replacement for the replacement, and this disk also shows 19ms access times.  Internet forums show other users with this issue on the same drive model.  I currently have WD level 2 tech support doing some investigating for me.  The spec access time is 13ms. My never replaced drive of the same model (I originally purchased 2 of them), shows 13ms access times.  The drive model for what it is worth is WD500AAKS.

    Here is checkdsk data so far...

    still executing window.....

    The type of the file system is NTFS.
    Volume dismounted.  All opened handles to this volume are now invalid.
    Volume label is DATA.

    CHKDSK is verifying files (stage 1 of 5)...
    2320 file records processed.
    File verification completed.
    0 large file records processed.
    0 bad file records processed.
    0 EA records processed.
    0 reparse records processed.
    CHKDSK is verifying indexes (stage 2 of 5)...
    8685 index entries processed.
    Index verification completed.
    5 unindexed files processed.
    CHKDSK is verifying security descriptors (stage 3 of 5)...
    2320 security descriptors processed.
    Security descriptor verification completed.
    455 data files processed.
    CHKDSK is verifying file data (stage 4 of 5)...
    2304 files processed.
    File data verification completed.
    CHKDSK is verifying free space (stage 5 of 5)...
    35 percent complete. (39315405 of 141186327 free clusters processed)


    The other 3 "hung" windows:

    The type of the file system is NTFS.
    Volume dismounted.  All opened handles to this volume are now invalid.
    Volume label is DATA.

    CHKDSK is verifying files (stage 1 of 5)...
     7 percent complete. (45708 of 65296 file records processed)
    Deleting corrupt attribute record (128, "")
    from file record segment 48730.
    65296 file records processed.
    File verification completed.
    8 large file records processed.
    0 bad file records processed.
    0 EA records processed.
    0 reparse records processed.
    CHKDSK is verifying indexes (stage 2 of 5)...
    212721 index entries processed.
    Index verification completed.
    5 unindexed files processed.
    CHKDSK is verifying security descriptors (stage 3 of 5)...
    65296 security descriptors processed.
    Security descriptor verification completed.
    Inserting data attribute into file 48730.
    3802 data files processed.
    CHKDSK is verifying file data (stage 4 of 5)...
    16 percent complete. (15000 of 65280 files processed)

    The type of the file system is NTFS.
    Volume dismounted.  All opened handles to this volume are now invalid.
    Volume label is DATA.

    CHKDSK is verifying files (stage 1 of 5)...
    29440 file records processed.
    File verification completed.
    11 large file records processed.
    0 bad file records processed.
    0 EA records processed.
    0 reparse records processed.
    CHKDSK is verifying indexes (stage 2 of 5)...
    109420 index entries processed.
    Index verification completed.
    5 unindexed files processed.
    CHKDSK is verifying security descriptors (stage 3 of 5)...
    29440 security descriptors processed.
    Security descriptor verification completed.
    2206 data files processed.
    CHKDSK is verifying file data (stage 4 of 5)...
    11 percent complete. (5000 of 29424 files processed)

    The type of the file system is NTFS.
    Volume dismounted.  All opened handles to this volume are now invalid.
    Volume label is DATA.

    CHKDSK is verifying files (stage 1 of 5)...
    77200 file records processed.
    File verification completed.
    48 large file records processed.
    0 bad file records processed.
    0 EA records processed.
    0 reparse records processed.
    CHKDSK is verifying indexes (stage 2 of 5)...
    285266 index entries processed.
    Index verification completed.
    5 unindexed files processed.
    CHKDSK is verifying security descriptors (stage 3 of 5)...
    77200 security descriptors processed.
    Security descriptor verification completed.
    4590 data files processed.
    CHKDSK is verifying file data (stage 4 of 5)...
    10 percent complete. (15000 of 77184 files processed)

    many thanks for the help and insight.....

    • Edited by dpkform Friday, August 28, 2009 5:47 PM
    Friday, August 28, 2009 5:27 PM
  • Hanging in chkdsk like that is indicative of a problem with your storage subsystem. So no, it's not expected bahavior. The spawning of multiple windows is completely normal; that batch file is intended to do exactly that in the interests of getting everything over with a little sooner.

    Since it's multiple disks, I'm more suspicious of your HBA (probably your chipset IDE or SATA ports) than I am of the individual disks.

    I'm not on the WHS team, I just post a lot. :)
    Friday, August 28, 2009 7:30 PM
    Moderator
  • I have had a similar problem in the past where one of my drives was dropping into PIO mode - which really slowed things down.
    In my case I think it was associated with the fact that I spin the drives down after they aren't used for a while and WHS was recording an error if they took too long to spin up.
    If this is the case you may find this link helpful - see http://winhlp.com/node/10 - I haven't had any problems since following the advice on that page.
    It could of course be realted to a failing disk?

    Thanks Lant76 and Ken for your help.

    Checking device manager, I noticed one of my two primary IDE channels was operating in pio mode. I deleted the checksum values as per the linked instructions, and things seem normal again.  I have once again started the process of removing the WD data drive, and the Disk Management add-in shows disk transfer rates of 40 to 55 MBps, just like they should be.

    I will keep an eye on things to see if any hardware issues creep back up.

    Thanks again, Ken and Lant76.
    • Marked as answer by dpkform Friday, August 28, 2009 9:16 PM
    Friday, August 28, 2009 9:14 PM
  • IDE channels tend to drop back to PIO mode for a reason. Usually that reason is a dying disk; by reducing the speed with which the disk is accessed, read, and written, the OS can "prolong the agony" a bit. Every time a certain error threshold is reached, Windows will drop back one level, from UDMA X to X-1, etc, until you get to PIO mode.

    So you should try to find the problem, because it's probably still there.

    I'm not on the WHS team, I just post a lot. :)
    Friday, August 28, 2009 10:05 PM
    Moderator
  • Thanks Ken I'll keep an eye on things. A couple days prior to noticing issues, I did remove my old IDE DVD-Rom drive from the server. Maybe somehow that triggered a fluke event somehow? So far everything every drive is still running udma mode 6, except for my older system drive which is dma mode 5, but I think is normal for that drive. No data on my system drive. One of my 500GB HDD, the WD that has NOT been RMA'd ever, did have an indication of "unhealthly", but repaired in a minte with the repair function. Failure prediction on the drive was "false". Other odd thing - got message that the backup database has errors, but that may relate back to the problem I was having. So, yes, there is enough reason to keep an eye on things.
    IDE channels tend to drop back to PIO mode for a reason. Usually that reason is a dying disk; by reducing the speed with which the disk is accessed, read, and written, the OS can "prolong the agony" a bit. Every time a certain error threshold is reached, Windows will drop back one level, from UDMA X to X-1, etc, until you get to PIO mode.

    So you should try to find the problem, because it's probably still there.

    I'm not on the WHS team, I just post a lot. :)
    Tuesday, September 01, 2009 7:50 AM
  • Found this thread via google and I had the same problem which is now fixed via the node10 link further up..

    One question tho, are your drives SATA or IDE drives, I've got 4 drives in my system, all WD SATA drives but there setup on the board as IDE so I was thinking maybe that was part of the problem.
    I've managed to move all my data off of the drives now (took a while at 3.5Mb/s until I found this).
    Going to reinstall WHS tonight with the drives in AHCI mode and the appropriate drivers.

    Cant think of anything specific I did with the machine when it started to play up other than setup a few PLA's around the house unless it was when I was messing around with the cache policies in device manager...
    Friday, September 04, 2009 8:57 AM
  • Also found this thread via google. Not running power pack 3 (yet), buy I had backups go from normal times to super slow over the past two weeks.

    In my case, I did have a drive with SMART predicting failure. I've removed that drive (a painful process -- just so slow that I just disconnected it because all shared folders were duplicated).

    Speed back to normal. Doing backups on all computers (all my backups were lost but yet it still showed 830 GB used for backups -- odd -- assuming that this is sectors with no backups referencing them that will get cleaned up during the next backup cleanup).

    After I have all my backups done I will reconnect the drive and run the WD diagnostics. Regardless of diagnostics I will still do an RMA. Just not going to trust this 2 TB drive anymore...
    Monday, September 07, 2009 4:22 AM
  • ...
    After I have all my backups done I will reconnect the drive and run the WD diagnostics.
    ...
    Don't. At least don't reconnect it to your server. It's still set up as a storage pool drive, and you may terminally confuse Windows Home Server if you reconnect it.

    Regarding painful removal of a drive that's in the process of failing, please take a look at, and vote and verify on, this product suggestion on Connect.

    I'm not on the WHS team, I just post a lot. :)
    Monday, September 07, 2009 2:29 PM
    Moderator