Answered by:
Does defragging your client machines every night cause excessive backups?

Question
-
I noticed that when WHS is backing up my client machine, in the little status box it says "determining which clusters have changed" or something similar. So this sounds like it's not doing a file-based backup but perhaps backing up only the changed clusters on my hard drive? If so, if I defrag my client machine which causes all my files to be shuffled around on the hard drive, but doesn't change the archive bits on any files of course, then will WHS see a ton of changed clusters and do a huge backup every time I defrag? Or does it go file by file and check the archive bit, in which case a client defrag won't cause excessive backups every night for files that really haven't changed?
Thursday, October 29, 2009 6:41 AM
Answers
-
To be specific, I'm using MyDefrag and running the "nightly" defrag, which is really just consolidates the free space. I do a full defrag once a month, so at that time there's major file movement going on. But even with the nightly defrag, I notice quite a few files get shuffled around to consolidate the free space.
I found some great info in this document that totally answers my question! http://download.microsoft.com/download/1/8/0/18096c95-4850-4176-9821-970691b98aaf/Windows_Home_Server_Technical_Brief_-_Home_Computer_Backup_and_Restore.docx.
In short, it looks like the answer is that defragging will cause any file on your local HD that has been moved to be checked again, but it will not cause any kind of duplication in your WHS backup folders. So your local machine will be pretty busy checking all those files during a backup operation, but defrag will have no effect on bloating backups on your WHS server at all.
Here's the relevant sections that really explain what's going on from the doc above:The home computer backup solution in Windows Home Server has a single-instance store at the cluster level. Clusters are typically collections of data stored on the hard drive, 4 kilobytes (KB) in size. Every backup is a full backup, but the home server only stores each unique cluster once. This creates the restore-time convenience of full backups (you do not have to repeat history) with the backup time performance of incremental backups.
The home computer backup occurs as follows:
· When a home computer is backed up to the home server, Windows Home Server software figures out what clusters have changed since the last backup.
· The home computer software then calculates a hash for each of these clusters and sends the hashes to the home server. A hash is a number that uniquely identifies a cluster based on its contents.
· The home server looks into its database of clusters to see if they are already stored on the home server.
· If they are not stored on the home server already, then the home server asks the home computer to send them.
· All file system information is preserved such that a hard disk volume (from any home computer) at any backup point (time) can be reconstituted from the database.
The copies of the clusters that are stored on the home server in the order they are received. Additionally, the home server stores metadata files which allow it to deduce which clusters belong to which home computer at which times. The metadata that is typically stored for each home computer usually takes less than 5 percent of the space of the clusters that are backed up.
Some of the implications of this backup approach are:
· The first backup always needs to send the entire contents of the disk to the home server.
· The first backup of a different home computer needs to read all of its data but only transmit the content of those clusters that the home server does not already have from another home computer.
· Subsequent backups after little change typically only read a small amount of data on the home computer and send the new clusters to the home server.
· After you defragment a home computer, the next backup may need to read a large amount of data from the disk (many files will appear to have been changed), but it only sends a relatively small amount of data to the home server because the clusters have been reorganized, but they have not changed.
- Marked as answer by esassaman Friday, October 30, 2009 4:59 AM
Friday, October 30, 2009 4:58 AM
All replies
-
My understanding of WHS's backup system is that it creates an indexed database of clusters for each machine it backs up and only stores one copy of each unique cluster in order to save space. It's a great system considering that if you have multiple computers running the same OS and many of the same applications, it only needs to store one copy of the needed information to restore all the systems on the network.
That would tend to make me think that as long the content of a cluster is unchanged, there should be no reason to copy it again during a backup. I think it just changes the db record in the backup to note the change in location.
Defragging every night does seem a bit excessive, but I don't think it will have much impact on the backup process.Thursday, October 29, 2009 8:27 AM -
It depends. I can't find the particular threads where this was discussed, but if you have a non-default cluster size on your client computer's hard disk, it's possible that a disk defragmenting utility will make Windows Home Server believe that large sections of the disk have changed.If your disks use the default cluster size of 4k, then you don't really have to worry about it.
I'm not on the WHS team, I just post a lot. :)Thursday, October 29, 2009 3:04 PMModerator -
To be specific, I'm using MyDefrag and running the "nightly" defrag, which is really just consolidates the free space. I do a full defrag once a month, so at that time there's major file movement going on. But even with the nightly defrag, I notice quite a few files get shuffled around to consolidate the free space.
I found some great info in this document that totally answers my question! http://download.microsoft.com/download/1/8/0/18096c95-4850-4176-9821-970691b98aaf/Windows_Home_Server_Technical_Brief_-_Home_Computer_Backup_and_Restore.docx.
In short, it looks like the answer is that defragging will cause any file on your local HD that has been moved to be checked again, but it will not cause any kind of duplication in your WHS backup folders. So your local machine will be pretty busy checking all those files during a backup operation, but defrag will have no effect on bloating backups on your WHS server at all.
Here's the relevant sections that really explain what's going on from the doc above:The home computer backup solution in Windows Home Server has a single-instance store at the cluster level. Clusters are typically collections of data stored on the hard drive, 4 kilobytes (KB) in size. Every backup is a full backup, but the home server only stores each unique cluster once. This creates the restore-time convenience of full backups (you do not have to repeat history) with the backup time performance of incremental backups.
The home computer backup occurs as follows:
· When a home computer is backed up to the home server, Windows Home Server software figures out what clusters have changed since the last backup.
· The home computer software then calculates a hash for each of these clusters and sends the hashes to the home server. A hash is a number that uniquely identifies a cluster based on its contents.
· The home server looks into its database of clusters to see if they are already stored on the home server.
· If they are not stored on the home server already, then the home server asks the home computer to send them.
· All file system information is preserved such that a hard disk volume (from any home computer) at any backup point (time) can be reconstituted from the database.
The copies of the clusters that are stored on the home server in the order they are received. Additionally, the home server stores metadata files which allow it to deduce which clusters belong to which home computer at which times. The metadata that is typically stored for each home computer usually takes less than 5 percent of the space of the clusters that are backed up.
Some of the implications of this backup approach are:
· The first backup always needs to send the entire contents of the disk to the home server.
· The first backup of a different home computer needs to read all of its data but only transmit the content of those clusters that the home server does not already have from another home computer.
· Subsequent backups after little change typically only read a small amount of data on the home computer and send the new clusters to the home server.
· After you defragment a home computer, the next backup may need to read a large amount of data from the disk (many files will appear to have been changed), but it only sends a relatively small amount of data to the home server because the clusters have been reorganized, but they have not changed.
- Marked as answer by esassaman Friday, October 30, 2009 4:59 AM
Friday, October 30, 2009 4:58 AM -
That concurrs with what I've read, but is very authoratative and a good explanation. Conceptually, my short version is that WHS stores one unique copy of the clusters on all the connected computers. Each backup then just creates a database snapshot of which clusters were on which computers on what dates. It only adds new clusters when they are mofified or created during normal use.
Whoever dreamt this up is way above my pay grade! Very clever.Friday, October 30, 2009 5:54 AM -
Actually defragging will significantly increase backup time/size for most default client installs. This is due to the fact that even though defrag utilities will not change the content of the (generally 4KB) clusters it does reorganise them. Since Windows Home Server uses VSS which is based on a 16 KB cluster size to determine the changed clusters many will appear changed causing increased backup size/time.Saturday, October 31, 2009 3:51 AMModerator
-
But the tech paper refrenced above says "The Windows Home Server backup engine stores data at the cluster level. Clusters are typically 4 kilobytes (KB) in size. The backup database records include clusters and hashes of these clusters" and "You can maximize the efficiency of the home server backup database by ensuring that all of the hard drives on your home computers are formatted with NTFS and with a cluster size of 4 KB". So what are you saying? It doesn't actually look at individual clusters?Saturday, October 31, 2009 8:14 AM