locked
Trivial error on backup -> Most backups lost RRS feed

  • Question

  • After initial installation of WHS 2011, I successfully backed up 2 out of 3 computers (one large desktop, two smaller laptops).

    As a last step, I started the backup of the second of the two laptops. Unlike the other two, I used WLAN for this. The backup ran very slowly (somewhat expected). After about 40% or so, the laptop was manually put to sleep. After it woke up again, it told that the backup failed. Even if I'd like a backup to simply continue after a suspend/resume, I can accept that this is not the case.

    The problem came right after: The server now claimed that the client backup database was corrupted. After a repair run, the backup of my desktop machine was lost.

    On two previous attempts of setting up the server I had similar problems with corrupted client database, but I didn't document these approaches well, so I can't really say where it went wrong there. After every setup-attempt I blanked all 4 disks (diskpart, clean) and installed from scratch.

    So my questions:

    1. Why does suspend/resume of a client while backing up corrupt the backup database? Shouldn't the server be able to deal with this?
    2. Why does a failed backup of one machine kill the backup of another machine?

    Thanks for any ideas/insights!

    Regards, Martin

    P.S. Some specs: Server is a HP Mediasmart EX490 with upgraded CPU (E5700) and 4GB RAM. Disks are a WD Green 1TB System Disk, and 3x2TB WD Green Data Disks. Using remote desktop, I converted the 3x2TB into a 4TB software Raid 5 and then moved all server folders to the Raid. I did not manually install any drivers (specifically not the Intel SATA driver).

    Sunday, January 29, 2012 11:08 AM

Answers

  • Q1:suspend/resume may not corrupt the backup database, mostly the inconsistence of client backup database is caused by "server unexpected power outage" or "server's disk has hardware isssue or bad cluster".

    Q2:That is because the design of the client backup database, which uses single instance approach. For example, you have two Win7 clients, there are lot of contents like system data are the same, during client backup, only one instance will be backup to the server, well, actually the single instance is on the cluster level not file level. Back to your question, if the inconsistence happens on the shared cluster data, all affected backup set which contains this data will be treated as broken, so repair the database will try to fix that and the worse case is deleting the backup. WHS 2011 has done a lot to improve that in order to fix as many as it can.

    If you continously see backup database has erros after running repair, I would suggest you run a disk repair on the server, here are the steps:

    1. Open the dashboard
    2. Navigate to Server Folders and Hard Drives tab
    3. Select Hard Drives sub tab
    4. For hard drives listed, double click each one and click "Check and repair" button to run a full check on the hard disk.

    Thanks.

    • Marked as answer by Tinue Monday, January 30, 2012 6:14 PM
    Monday, January 30, 2012 3:04 AM

All replies

  • Q1:suspend/resume may not corrupt the backup database, mostly the inconsistence of client backup database is caused by "server unexpected power outage" or "server's disk has hardware isssue or bad cluster".

    Q2:That is because the design of the client backup database, which uses single instance approach. For example, you have two Win7 clients, there are lot of contents like system data are the same, during client backup, only one instance will be backup to the server, well, actually the single instance is on the cluster level not file level. Back to your question, if the inconsistence happens on the shared cluster data, all affected backup set which contains this data will be treated as broken, so repair the database will try to fix that and the worse case is deleting the backup. WHS 2011 has done a lot to improve that in order to fix as many as it can.

    If you continously see backup database has erros after running repair, I would suggest you run a disk repair on the server, here are the steps:

    1. Open the dashboard
    2. Navigate to Server Folders and Hard Drives tab
    3. Select Hard Drives sub tab
    4. For hard drives listed, double click each one and click "Check and repair" button to run a full check on the hard disk.

    Thanks.

    • Marked as answer by Tinue Monday, January 30, 2012 6:14 PM
    Monday, January 30, 2012 3:04 AM
  • Thank you! It may have been a coincidence: Today the RAID was gone (i.e. the drive letter disappeared), and there was an error about a controller error in the windows log file. After a reboot, the drive is back, and I am running yet another backup database repair job.

    What could cause a "controller error"? These are the details:

    atapi
    EventID 11
    [ Qualifiers]  49156
    Level 2
    Task 0
    Keywords 0x80000000000000

    In previous installations I also installed the Intel ATAPI driver, but got similar results.

    Is the machine broken?

    Monday, January 30, 2012 6:19 PM
  • What could cause a "controller error"?
    See this KB article for some troubleshooting steps.
    Is the machine broken?

    Probably.


    I'm not on the WHS team, I just post a lot. :)
    Tuesday, January 31, 2012 5:09 AM
  • Thank you Ken for your answer!

    After some more research I fiund that one of the disks in the pool is degraded. It shows a SMART "Current Pending Sector Count" of 197 instead of 200, which means that a number of sectors (3?) have become unreadable. Apparently, if such a sector is being read, this leads to the "Controller Error" that I observed (essentially it is a timeout).

    What I do not understand is the reaction of the operating system to such an error, especially in a RAID 5: What happened is that my drive letter disappeared. After a reboot, the RAID volume was back.

    In a RAID 5 I would expect that a read error has no consequences, except of a warning that a drive has gone bad. All data should still be readable, and the volume should remain writable. Instead, the volume disappeared in the middle of a backup, resulting in a corrupted backup database.

    So I wonder what the purpose of a RAID 5 is, if it reacts even worse to a read failure than a single disk?

    Does anyone have experience with the software RAID from Windows Server 2008r2 (i.e. WHS 2011)? Is this normal behaviour under this OS, or are there more issues on my system/hardware?

    Thanks! Martin

    Saturday, February 4, 2012 2:23 PM