locked
Advanced questions about the sync framework. RRS feed

  • Question

  • Hi,

     

    I am file system kernel level developer and I have the following questions about the the file system synchronization as well as the sync framework.

     

    Is there support the for opened files as well as locked files?

    • For instance a sync'ing a user's Outlook pst file while Outlook remains open?

    Does the File System synchronization support NTFS Alternate Data Streams (ADS) ?

    • If not are the plans to do this or is it necessary to implement a solution using the framework in order to support file with ADS?

    Does the framework support hard links, junctions or reparse points?

    • In several situations it is necessary to reproduce these functionalities on the remote/replicate sites.

    Does the file system syncrhonization sync a file's attributes last changed and last access timestamp?

    • In NTFS there is a file attribute called containing the timestamp of the last change which most users and developers are unaware (and I am not referring to last modification).

    Does the framework support the Windows transaction file system?

    • Sync'ing a file/directories should be done only on committed transaction.

    With the hash option enabled, how does it would sync a 10 GB file?

    • Is there a single pass hash value on the entire file, if they match then no sync is required.
    • Is there a single pass with multiple intermediary hash values at specific regions of the file?
    • Are multiple passes between the origin and remote with hash of regions of the file?

    Can the hashing algorithm and the number of bits (32, 64, 128, 256) be specified?

     

    Thanks...

    Thursday, November 8, 2007 2:59 PM

Answers

  • Thanks for the quick response Rafik. Steven, I wanted to clarify a couple of points in the response above.

     

    - We don't support alternate stream today but  this support is definitely in the plans for an upcoming release. Keep in mind however even with this support if the file system on either side is not NTFS the alternate streams will not be preserved.

     

    - Can you please describe the scenario where you want to sync an open Outlook PST file? Seems a little dangerous to me. Whether we "support" it today depends on whether Outlook open the file in exclusive mode or not. If the file is opened exclusively by Outlook we will fail to write to it. If the file is not opened by Outlook in exclusive mode but was changed on the destination since the start of the sync, we will skip the change because we don't want to overwrite conflicting data. The other thing to worry about is whether Outlook can handle it appropriately if the file changes underneath it.

     

    - Regarding reparse points and junctions. We explicitly do not follow directory junctions because in our view that's dangerous given there can be cycles and also wrong in many cases (take the case where in Vista for backward compatibility, the users Documents folder has junction points for My Pictures, My Videos, etc. - it's not right to follow the link in this case). As far as file reparse points go, we will read through the reparse point and and sync the file normally. When syncing TO a file that's a reparse point, the reparse point will be lost and we will replace it with a normal file. This supports the Windows Home Server sync scenario where the home server load-balances files across disks but uses reparse point to maintain a single namespace. In this case it's ok to lose the reparse point because the home server will load balance again as appropriate. We donot do anything special with hard links. If you have two links pointing the the same file in the namespace you're synchronizing, you will end up with two file copies on the other side. If you have specific scenarios where you would like to see more support in this area I would love to hear those.

     

    - We sync the file's creation time and last write time. I was not aware there was another timestamp available for modificaton time - will look into this. Whats this attribute called? Do you know if there are applications that care about the other modificaton time stamp? We will probably never sync the last access time because that can cause perf to degrade with a large number of unnecesary file syncs.

     

    - We don't support TxF today - it's a lower priority request on our list currently. We do ensure today that when we write the file stream you won't end up with a partial copy - we use the normal copy to temp file and then move to correct location trick for this. I understand that this does not give you the durability that you may require - but then again I would have to understand the scenario requirements better where this is really really important. Most applicaitons dont use TxF today and in general people are fine with the robustness provided by today's applications that read and write from the file system.

     

    - W.r.t hashing - it's a single pass hash algorithm today. We hash the file stream during change detection and if the hash matches what we had seen the last time, we won't sync the file. It's not possible to specify the hash algorithm or the number of bits - please describe a scenario where this would be useful/necessary. Also keep in mind that hashing is somewhat costly - for each sync operation, the file will be hashed on both sides once for change detection and it will be hashed again on the destination before we write to it to ensure that nothing got in after the change detection was done and modified the file stream.

     

    Thanks

    Ashish Shah

    MSF Developer

    Friday, November 9, 2007 6:58 PM
    Answerer

All replies

  • Hi Steven,

     

    The Sync Services for File Systems component of the framework is designed to work against FAT as well NTFS. Think of synchronizing the content of a USB memory drive.... With that in mind, here are short answers to your questions:

     

    1- Well, if the file is already opened with shared read, then the sync provider should be able to open it for read and send it across. Otherwise, it won't. In order to sync an opened file, you need a file system filter driver which is not part of the framework

     

    2- No alternate stream support

     

    3- No support for junctions or reparse points

     

    4- I believe no, but i need to double check this one.

     

    5- No support for TXn

     

    6- This is just the first CTP, file hashing or remote compression is not supported at this point

     

     

    These are all good requests and I will make sure to pass them along to the file sync provider team.

     

    Thanks

     

    Friday, November 9, 2007 5:21 AM
  • Thanks for the quick response Rafik. Steven, I wanted to clarify a couple of points in the response above.

     

    - We don't support alternate stream today but  this support is definitely in the plans for an upcoming release. Keep in mind however even with this support if the file system on either side is not NTFS the alternate streams will not be preserved.

     

    - Can you please describe the scenario where you want to sync an open Outlook PST file? Seems a little dangerous to me. Whether we "support" it today depends on whether Outlook open the file in exclusive mode or not. If the file is opened exclusively by Outlook we will fail to write to it. If the file is not opened by Outlook in exclusive mode but was changed on the destination since the start of the sync, we will skip the change because we don't want to overwrite conflicting data. The other thing to worry about is whether Outlook can handle it appropriately if the file changes underneath it.

     

    - Regarding reparse points and junctions. We explicitly do not follow directory junctions because in our view that's dangerous given there can be cycles and also wrong in many cases (take the case where in Vista for backward compatibility, the users Documents folder has junction points for My Pictures, My Videos, etc. - it's not right to follow the link in this case). As far as file reparse points go, we will read through the reparse point and and sync the file normally. When syncing TO a file that's a reparse point, the reparse point will be lost and we will replace it with a normal file. This supports the Windows Home Server sync scenario where the home server load-balances files across disks but uses reparse point to maintain a single namespace. In this case it's ok to lose the reparse point because the home server will load balance again as appropriate. We donot do anything special with hard links. If you have two links pointing the the same file in the namespace you're synchronizing, you will end up with two file copies on the other side. If you have specific scenarios where you would like to see more support in this area I would love to hear those.

     

    - We sync the file's creation time and last write time. I was not aware there was another timestamp available for modificaton time - will look into this. Whats this attribute called? Do you know if there are applications that care about the other modificaton time stamp? We will probably never sync the last access time because that can cause perf to degrade with a large number of unnecesary file syncs.

     

    - We don't support TxF today - it's a lower priority request on our list currently. We do ensure today that when we write the file stream you won't end up with a partial copy - we use the normal copy to temp file and then move to correct location trick for this. I understand that this does not give you the durability that you may require - but then again I would have to understand the scenario requirements better where this is really really important. Most applicaitons dont use TxF today and in general people are fine with the robustness provided by today's applications that read and write from the file system.

     

    - W.r.t hashing - it's a single pass hash algorithm today. We hash the file stream during change detection and if the hash matches what we had seen the last time, we won't sync the file. It's not possible to specify the hash algorithm or the number of bits - please describe a scenario where this would be useful/necessary. Also keep in mind that hashing is somewhat costly - for each sync operation, the file will be hashed on both sides once for change detection and it will be hashed again on the destination before we write to it to ensure that nothing got in after the change detection was done and modified the file stream.

     

    Thanks

    Ashish Shah

    MSF Developer

    Friday, November 9, 2007 6:58 PM
    Answerer
  • Ashish, thanks for the response.

     

    >>- Can you please describe the scenario where you want to sync an open Outlook PST file?

    >>Seems a little dangerous to me. ... The other thing to worry about is whether Outlook can

    >>handle it appropriately if the file changes underneath it.

     

    In the model to which I most common refer is the situation in which computer A is designated as the master file system and computer B is designated replica file system. While on computer A applications (such as Outlook) are running and computer B is not running any applications. Thus computer A can have files can be opened while on computer B does not have any files opened at the time of the sync.

     

    There are several scenarios regarding opened files, if the file opened with read-share or exclusive, memory mapped files as well as locks. Each of these have various solutions which allow for a process to read the contents of the opened files. It is even possible without the need of a kernel mode driver (I know since I have done it in user mode process).

     

    The file can be opened for a long time, waiting for the file to be closed before issuing the sync may cause for a lot of the contents to be sync'ed for a considerable amount of time. Thus sync'ing opened files is useful as to save time, since at the time of the close the difference will be small (we sometime refer to this as the last mile being the last changes since the last sync).

     

    This solution makes sense with a hashing solution, in which only the differences are sent to the replica.

     

    As you stated, while it is unknown how Outlook will react when opening a file on computer B on an 'unclosed' image of the file. However, applications such as SQL Server or Exchange Server which support crash recovery this concept is useful.

     

    >>- Regarding reparse points and junctions. We explicitly do not follow directory junctions >>because in our view that's dangerous given there can be cycles and also wrong in many

    >>cases..

     

    How can a Windows feature be dangerous :-?. But seriously, I understand to what your are referring to. My experiences, in a replica scenario when the junction (source and target) are both part of the set of files being replicated then it is recommended to create the junction on the replica file system. Otherwise when only one side of the junction is part of the files being replicated, in this case the files sync'ed to the replica file system as regular files instead of creating the junction on the replica file system.

     

    >>- We sync the file's creation time and last write time. I was not aware there was another

    >>timestamp available for modificaton time - will look into this...

     

    See the structure FILE_BASIC_INFORMATION http://msdn2.microsoft.com/en-us/library/aa491634.aspx the field to which I refer is called ChangeTime. It is field that is not seen from the MS-DOS command shell nor from Windows Explorer. However, it keeps track the time of the latest change. Unlike the lastWrite, which is the time of the last write, this field occurs when there is a change to the file. It is uncommon difficult for a developer to revert the value of this field with an earlier timestamp. Unlike the lastWrite in which anyone can do a 'touch' to reset a timestamp field to an earlier/later point in time such as 2002-02-05 5:00:00 PM.

     

    >>- We don't support TxF today - ... Most applicaitons dont use TxF today and in general people

    >>are fine with the robustness provided by today's applications that read and write from the file

    >>system.

     

    I would like to correct on this, installation programs are rewritten as to make use of transactions when the Window OS supports it and I know of others products which are integrating transactional support.

     

    The time at which the sync is done can be significant, since the replica get be given a 'stale' copy of the files when there is a transaction ongoing. Therefore the sync needs to the re-executed once the transaction is committed. I think that what would be needed is a notification/event mechanism to indicate when the transaction is committed.

     

    Conclusion,

     

    The support for alternate data streams, lastAccess & lastChange timestamp attributes, junctions, reparse point, ans transactions depends upon whether if the sync framework is designed as to be a mechanism for an exact replica of a file system.

     

    From my experiences, a solution that chooses not to support some file system features will cause issues later on. I have seen applications crash, because they expected for an ADS on a file to exist, or fail to take action based on lastWrite or lastChange not being up to date at the replica site or that a junction point not exist.

     

    It is my suggestion that those at Microsoft working on the file system sync project(s) to consider these factors and documentation of what is and what is not supported before developers using the file system sync framework encounter these issues.

     

    Good luck on the project.

    Wednesday, November 21, 2007 4:01 AM