locked
Large database sync scenario RRS feed

  • Question

  • I have been evaluating Sync Framework 4.0 for syncing data to Windows Phone 7. In particular, I'm looking at how to sync a large number of records, testing with a table containing 44000 records.

    While the sync using the isolated Storage client works (with batching enabled on the server), it takes 20 minutes to sync such a large volume of data which will not be acceptable in a real application. By contrast, I can do a one-way download of the same data using WCF Data Services to the phone in around 3 minutes, which would be acceptable.

    I can see a solution for this, but I'd like to know if this is possible.
    Firstly, instead of the Isolated Storage client, i would write my own sync provider to store records into a file-based database - I am using Perst on Windows Phone 7 which meets this requirement. In theory, the initial sync could then be done server-side (perhaps even as an overnight batch job) and we would then transfer the resulting database files over to the phone client and copy them into Isolated Storage.

    One thing that might make this easier if we have a large number of clients is if you provide the ability to generate a client-side database on the server one time (a kind of snapshot) and then when a new client needs to sync, to take a copy of a  that server-side generated client data store, assign it a unique client GUID and effectively clone the meta data you store on the server. That way we could transfer this new clone down to the new client and the server would be setup correctly to sync with it.  Any chance of you supporting that scenario?

    Andy

    Thursday, December 9, 2010 12:41 PM

Answers

  • Hi Andy,

    For large data sizes, Isolated Storage is probably not a good solution since we keep all data in memory today. We would need sync provider implementations on silverlight that can work with deferred loading of data in memory. Siaqodb has written one custom provider for silverlight - http://siaqodb.com/?p=342. We hope that there will be more of these in the near future.

    We have been thinking of a snapshot like functionality for fast initialization of clients. This is a good idea and will improve initialization time for clients. I am not sure if this will make it to the next release but we are investigating several ways to improve client sync time for large data sets.  


    SDE, Sync Framework - http://www.giyer.com
    • Marked as answer by Ganeshan Thursday, December 9, 2010 6:22 PM
    • Unmarked as answer by Ganeshan Thursday, December 9, 2010 6:22 PM
    • Proposed as answer by Ganeshan Thursday, December 9, 2010 6:22 PM
    • Marked as answer by Ganeshan Saturday, December 11, 2010 2:11 AM
    Thursday, December 9, 2010 6:22 PM
  • Hi Andy,

    We have put up a guideline for implementing clients for sync services (http://msdn.microsoft.com/en-us/library/gg299005(v=SQL.110).aspx). This talks about how to maintain consistency with batching. Every batch that we generate can be committed/saved on the client (this involves, saving data, resolving conflicts - if needed and then finally saving the anchor) and sync will resume enumerating from the last point you stoppped incase there is a network failure when downloading batches. The anchor/blob that is passed up and down with every sync request has the sync knowledge for the scope the client is involved in.

    Please have a look and let me know if you have any questions.

    -Ganeshan


    SDE, Sync Framework - http://www.giyer.com
    • Marked as answer by Ganeshan Saturday, December 11, 2010 2:10 AM
    Friday, December 10, 2010 5:36 PM

All replies

  • Hi Andy,

    Regarding the snapshot-based client bootstrapping, as of now there is no plan to support this but I'll discuss this up with the feature team.

    Regarding writing your own sync provider, I am guessing your clients would only sync with the server, not among themselves (hub-spoke model). If so, you could implement your sync provider inheriting from Microsoft.Synchronization.ClientServices.OfflineSyncProvider and using Perst as the data store underneath. This enables your sync provider to be hooked into CacheController Silverlight component of the Sync Framework 4.0 CTP. CacheController enables syncing with the service that you can build on the server side using the service dll that's shipped in the CTP.

    This way you can have your data store in Perst, which you could theoretically bootstrap using the WCF Data Services like you have already tried, and use the CTP's components to perform incremental sync there onwards.

    Will this meet your requirements ?

    Sameer

     

    Thursday, December 9, 2010 6:20 PM
  • Hi Andy,

    For large data sizes, Isolated Storage is probably not a good solution since we keep all data in memory today. We would need sync provider implementations on silverlight that can work with deferred loading of data in memory. Siaqodb has written one custom provider for silverlight - http://siaqodb.com/?p=342. We hope that there will be more of these in the near future.

    We have been thinking of a snapshot like functionality for fast initialization of clients. This is a good idea and will improve initialization time for clients. I am not sure if this will make it to the next release but we are investigating several ways to improve client sync time for large data sets.  


    SDE, Sync Framework - http://www.giyer.com
    • Marked as answer by Ganeshan Thursday, December 9, 2010 6:22 PM
    • Unmarked as answer by Ganeshan Thursday, December 9, 2010 6:22 PM
    • Proposed as answer by Ganeshan Thursday, December 9, 2010 6:22 PM
    • Marked as answer by Ganeshan Saturday, December 11, 2010 2:11 AM
    Thursday, December 9, 2010 6:22 PM
  • Thanks Sameer,

    Yes - clients would only sync with the server. I will try to implement my own Sync provider. Looking at the samples in the CTP, the WM65 sample seems to be the best example of something similar to what i want to do - would you agree?

    Andy

    Friday, December 10, 2010 10:43 AM
  • Thanks, Ganeshan,

    I would like to create a sync provider for Perst. As with Siaqodb, this has the advantage of being able to persist objects as they are deserialized, so they are not all held in in-memory collections.

    I am interested in what advice you would give to guarantee the integrity of a collection of records syncd in this way. Although batching implemented in the service is a good way of dividing the sync into multiple HTTP requests-response there is the risk that communications will fail before all transfers have completed.

    What is the best way of safeguarding your solution against partial updates in the client-side cache?

    Andy

    Friday, December 10, 2010 10:49 AM
  • Hi Andy,

    We have put up a guideline for implementing clients for sync services (http://msdn.microsoft.com/en-us/library/gg299005(v=SQL.110).aspx). This talks about how to maintain consistency with batching. Every batch that we generate can be committed/saved on the client (this involves, saving data, resolving conflicts - if needed and then finally saving the anchor) and sync will resume enumerating from the last point you stoppped incase there is a network failure when downloading batches. The anchor/blob that is passed up and down with every sync request has the sync knowledge for the scope the client is involved in.

    Please have a look and let me know if you have any questions.

    -Ganeshan


    SDE, Sync Framework - http://www.giyer.com
    • Marked as answer by Ganeshan Saturday, December 11, 2010 2:10 AM
    Friday, December 10, 2010 5:36 PM