Loss Of Data! RRS feed

  • General discussion

  •  Under certain circumstances, data is not preserved in file replication.

    A File on an NTFS drive, with alternate NTFS Streams is not replicated in its entirety. The Alternate streams are lost when replicating.

    Steps to demonstrate:

    On MachineA, go to a command prompt.
    "cd C:\Meshed"  (Where Meshed is a Live Folder, and C: is an NTFS volume.)
    "echo This file exists. > C:\Meshed\mesh.txt"
    "echo This stream exists. > C:\Meshed\mesh.txt:MeshStream"
    "type C:\Meshed\mesh.txt"   ->  returns [This file exists.]
    "more < C:\Meshed\mesh.txt:MeshStream" -> returns [This stream exists.]

    Allow sufficient time for the file to replicate to MachineB, and the volume is an NTFS volume as well..
    Go to a command prompt. 
    "type C:\Meshed\mesh.txt"   ->  returns [This file exists.]
    "more < C:\Meshed\mesh.txt:MeshStream" -> errors [File not Found.]

    Proposed solution that you guys should refine:
    Checking that the file is on an NTFS volume, and whether alternate data streams exists. If so, encapsulate each stream into its own enclosure in the feed.

    Obvious obsticle to overcome: Stream enumeration is usually done with either BackupRead() and BackupWrite() (a win32 api), FindFirstStreamW/FindNextStreamW (win32api available from WinS2K3, Vista, or Srv2K8), or through Native APIs (NTQueryFileInformation). Probably will be a bear to call since Mesh is written in .Net which doesn't really support streams.

    Second issue: If Mesh regenerates feeds entirely on a detected update, there will be problems. If only attributes that get updated are modified, then you may be able to preserve the data across unsupported devices.
    Example: Summary information can be added to any file, even the ones not based on XML or OLE Structured Storge (old Word Docs). If Summary Information is added to a Text File on your PC (which is NTFS) and then this File is eventually synced with your SmartPhone (which will be either RAW or FAT), when replicating back all the Summary Info is lost. If only the main enclosure is updated, then the Summary Info can be preserved. ... (I don't believe Cloud Storage is an NTFS volume either from Channel9 videos I saw...)

    p.s. * "Streams"  is a handy utility for enumerating alternate data streams.
    * CodeProject's, community contributed .Net wrapper class may be helpful as a starting point
    * A MSDN Sample includes a property page shell-extension (a new tab when you check the properties of a file) that lists alternate streams on a file.

    With all the Stink about Rootkits in the past few years, I'm surprised streams don't get more attention.
    • Edited by WillFa Wednesday, June 25, 2008 6:38 PM Added References
    Wednesday, June 25, 2008 5:50 PM

All replies

  • Hi Will,

    Live Mesh does not support NTFS streams today.  It is something that we considered, but was cut from the initial release.  You point out many of the issues that we thought about as well.  We will most probably consider alternate stream support in the future, especially if there is demand for it from the user base.


    Richard Chung [ Live Mesh ]
    Wednesday, June 25, 2008 8:09 PM
  • Thanks very much for your reply Richard.

    After performing a quick scan of my harddrive, some common streams that are out there (more for everyone else's benefit than yours, since I trust you know what you're doing):

    encryptable used by Explorer on thumbs.db - not an issue since hidden files don't sync.

    favicon used by IE/Explorer on .url files - the locally cached icon (I'm presuming for when a web connection is unavailable, maybe just for immediate display while waiting on an async callback to check if it's updated)

    Zone.Identifier used by IE/Explorer on a lot of files - a security feature introduced in XP SP2 It's what tracks the "Hey, you downloaded this from the internet" dialog box when opening a file/executable. Is stripping security information good?

    OEStandardProperty used by Windows Live Mail on .eml / .rss files - User's state information. With all the requests about syncing pst's (which I can't see how it'd be done since Outlook locks them for exclusive access), I liked WLM as a fat client since it uses a file heirarchy for a message store. Stripping out state information (which ones have been read) degrades my user experience. WLM on Desktop and Laptop, items read on the Laptop aren't marked read on the PC.

    [Chr(5)]SummaryInformation used by Explorer on almost any file - Mentioned above, but relisted for completeness. Is destroying User entered data ever a good thing?

    Granted NTFS Streams aren't very prevalent, and really the only company I know of using them is Microsoft. My personal priorities make numbers 1 and 2 are easy to live without;
    #3 gives me pause since it's a security feature you're bypassing. (What if someone sharing my folder downloads a trojan named after a hotfix. I want the "from an Untrusted Zone" warning when trying to open it after it syncs back to my machine.);
    #4 affects and annoys me personally, though I can't really claim a "greater common good" since I have no idea how prevalent WLM usage really is. Two "Live" branded apps that don't really interoperate/integrate seems sketchy.
    #5 is unforgivabe in my opinion.

    <friendly ribbing>
    You're Microsoft! The company that gave us UAC and Outlook unilaterally blocking 'unsafe' attachments. Just because it's annoying or difficult is no reason not to do it.

    Thanks again. I really do like the concept of the product. :)
    Wednesday, June 25, 2008 10:04 PM