locked
Backup Cleanup at 91% after 16+ hours RRS feed

  • Question

  • My WHS v1 has 452GB of backups stored (retaining 25 of the monthlies, 9 of the weeklies and 8 of the dailies) and it's been taking progressively longer to "cleanup" for the past several months. Today it's been cleaning up for over 16 hours and is at 91%. I've watched for long enough to see it go from 90 to 91% so it's not hung. But its interactive responsiveness about now is very poor.

    I've checked the logs and the only thing of any interest lately is a few (4 in last 16 hour; no more than that in last 48) device timeouts on one of the disks. I've re-seated all three disk SATA cables.

    Any thoughts how typical this is and what can be done about it?

    Sunday, November 28, 2010 11:38 PM

All replies

  • If all the timeouts are on the same disk, it's likely that that disk is failing. Depending on the exact issue, this may have caused your server to step the IO mode back from UDMA 5 or 6 to PIO, which will result in very slow disk access and (usually) high CPU usage.

    If they're random (on all your disks), it could be that you have a failing disk controller (which could have the same effect).


    I'm not on the WHS team, I just post a lot. :)
    Monday, November 29, 2010 12:58 AM
    Moderator
  • These log entries are one drive only but I haven't yet convinced myself what physical drive is ahcix861 / IdePort1. [Any clue how to tell what IO mode they're actually operating in?] -- It's in the IDE drivers proerties advanced tab. Sure enough, one of the drives is running PIO. Hmm... I think I'll try a replacement cable and start sourcing a replacement drive.

    With the rare and intermittent nature and no other hints in the error log or elsewhere, I'm wanting to see some more positive confirmation before popping for a replacement 1TB drive.

    Monday, November 29, 2010 1:43 AM
  • If one of your drives is in PIO mode, either you've fallen victim to a known issue related to low power modes on Windows Computers or you have some failing component. Try resetting to UDMA mode (you can find a number of articles on Microsoft support about this) and see if the problem comes back. If it does, it's hardware.
    I'm not on the WHS team, I just post a lot. :)
    Monday, November 29, 2010 5:09 AM
    Moderator
  • IF WHS ever gets done removing the drive, I'll try that. (The bar has advanced about 5% or so in the last 7 hours.) Thanks for the vectors.

    FWIW, one of the articles I found answered a nagging question. I was wondering about the probablility of drive electronic failure resulting in it operating in PIO not UDMA6. Apparently forcing a drive that has experienced a number of device timeout errors to lower DMA modes and ultimately to PIO is a default recovery strategy built into the drivers. So PIO mode is potentially a side effect of a root cause that's back in drive mechanical hardware.

    Monday, November 29, 2010 12:53 PM
  • So PIO mode is potentially a side effect of a root cause that's back in drive mechanical hardware.
    Replace "potentially" with "usually". Most computers will never fall victim to the issue relating to standby/hibernation, because Microsoft (mostly) fixed the causes of that a long time ago. You should still look at the easy fix first in this case, because it's free. :)

    I'm not on the WHS team, I just post a lot. :)
    Monday, November 29, 2010 1:57 PM
    Moderator
  • I guess my point was that reversion to PIO mode could be traced to a mechanical hardware failure (which in my experience are the vast majority of disk drive failures) not just to electronic/logical failures of the interface (which in my experience are very rare).

    I will test the free fix, probably about Thursday, when the drive removal process completes. It's gone less than 1/3 of the way across hte progress bar in almost 24 hours… I knew PIO was slow, but I never would have guessed it's that much slower. I'm not sure how slowing down the I/O channel for mechanical problems (timeouts) is supposed to solve anything. But that's another thread.

    Tuesday, November 30, 2010 2:16 AM