locked
How to: use a DataSetSurrogate instead of a DataSet to improve sync framework performance over WCF? RRS feed

  • Question

  • I am trying to reduce the amount of traffic caused by using the sync framework over a WCF service.

     

    My current goal is to use binary formatting in conjunction with a DataSetSurrogate class.  I'm using netTcpBinding on my WCF service, which automatically uses a BinaryFormatter instance by default.  That improves the amount of data in the requests/responses somewhat, but DataSet itself still serializes itself internally to Xml, and that Xml is what is formatted using BinaryFormatter.

     

    The suggested approach for dealing with this is to use a DataSetSurrogate for serialization, which can do a true binary serialization/deserialization. 

     

    Links:

    I understand in theory that I'm supposed to send a DataSetSurrogate back and forth over the wire instead of serializing a DataSet, but I don't understand how to do that. 

     

    One avenue would seem to be to somehow convince WCF on the server and the client to use a custom instance of BinaryFormatter, with BinaryFormatter.SurrogateSelector set.  Since Microsoft's implementation of DataSetSurrogate isn't a true "surrogate" (it doesn't implement ISerializationSurrogate), I would need to implement that myself.  However, I don't know how to make WCF use a custom BinaryFormatter instance.

     

    Another avenue would seem to be to convince Sync Framework to instantiate a DataSetSurrogate to wrap the DataSet that contains the changed data instead of using the DataSet itself.  Currently it appears that DbServerSyncProvider::GetChanges is what is creating the DataSet, and I'm not sure how I would override that or insert myself into that process.

     

    The blog entry above from Mahjayar has this quote in it regarding the DataSetSurrogate:

     

    "Download it and use it in your WCF based Synchronization apps and you should see vast improvement in your memory usage and performance. "

     

    That makes it sounds like it should be very simple to simply "use it", but so far I'm blocked.  Can anyone help me out?

     

    Thanks,

     

    David Cater

    • Moved by Hengzhe Li Friday, April 22, 2011 2:11 AM (From:SyncFx - Microsoft Sync Framework Database Providers [ReadOnly])
    Wednesday, January 21, 2009 3:50 PM

Answers

  • The Sync Develpement team is already aware of this issue and is working on the best effort to improve performance when synchronizing data over the network.

    Thanks.
    Leo Zhou ------ This posting is provided "AS IS" with no warranties, and confers no rights.
    Wednesday, July 29, 2009 8:25 PM
    Answerer

All replies

  • There's another option, of course, that I'm hoping isn't the final answer.  I'm sure I could use the core Sync Framework SyncProvider services to create my own version of Sync Services for ADO.NET that works with DataSetSurrogate instead of DataSet.  That doesn't sound simple or straightforward at all, though, and I'm hoping that's not the solution.

     

    D.

    Wednesday, January 21, 2009 4:19 PM
  • Interesting you posted this today, I was about to do the same.  We have a limited number of users (only 5) doing a beta test of our application.  The database we are synchronizing is pretty small, in total only about 20,000 rows total data.  I configured our WCF HTTP proxy to use binary serialization.  However, due to the XML serialiation problem of datasets the total data we are transferring is pointlessly bloated.  Our network engineers thought a DOS attack was occuring when a tablet was synching due to the bloat.

     

    I am currently investigating applying GZIP compression to the WCF bindings AND/OR using a surrogate on the dataset.  To get a reasonable result I would imagine we would only have to do one or the other and not both. 

     

    It would seem that the surrogate examples you and I found are referring to a prior version of the synch framework.  Currently, I think that in order to use the surrogate we would have to create our own WCF MT methods which return a value inherited from SyncContext which contains a surrogate versus a dataset.  On the client, this package would reconstruct a dataset and the SyncContext from the contained surrogate. 

     

    It all seems like quite a bit a work in order to do something which should be supported by synch services in the first place.  It should be noted that on their own website, Microsoft warns against transmitting Datasets over WCF due to incompatibility and bloat.  Yet a MS framework is transmitting datasets anyway, someone must have missed that memo maybe???

     

    I am going to run some tests this week using GZIP and if that solves the problem well enough I'll post back here.  If not, I will probably be creating my own inherited sync classes which support the use of dataset surrogates. 

     

    Wednesday, January 21, 2009 7:23 PM
  • Thanks for the response.   I'm working on the SyncContext subclass right now.  So far it seems...tricky.  I'm not sure it's going to be possible, but I'm going to keep working on it.

     

    In the meantime, keep an eye on this posting as well: http://forums.microsoft.com/Sync/showpost.aspx?postid=4310992&siteid=75&&notification_id=2195910&message_id=2195910&agent=messenger .

     

    Also, you probably came across this post on using a custom binding extension to implement compression, but I figured it might be useful to mention it here anyway:

     

    http://forums.microsoft.com/sync/ShowPost.aspx?PostID=2735882&SiteID=75 

     

    D.

    Wednesday, January 21, 2009 7:30 PM
  • Update on attempting to use DataSetSurrogate:

     

    Based on a suggestion from PPCDude in the other post I mentioned, I started looking at manipulating the SyncContext returned by GetChanges.  That has not gone well.  Here's what I'm seeing:

    • SyncAgent:: Synchronize is the public function you call to start synchronization.  It is not virtual.
    • Synchronze calls a private method UploadChanges.
    • UploadChanges calls GetChanges to get a SyncContext object ("sc").
    • UploadChanges calls remoteProvider::ApplyChanges, which takes a DataSet param.  sc.DataSet is passed to it.

    That sequence tells me that if I'm going to affect what goes over the wire based on manipulating GetChanges, I'm going to have to set the DataContext property in the syncContext to an instance of something derived from DataSet.

     

    At first I thought that might work out.  I changed DataTableSurrogate to derive from DataSet and overrode DataSet::GetObjectData.  GetObjectData is defined in the ISerializable interface, and I thought that would be called when the DataSet was serialized for output over the wire.

     

    Unfortunately it was not.  DataSet also implements IXmlSerializable, which defines a WriteXml method.  DataSet::WriteXml is what gets called during serialization, and that is private within DataSet. The helper WriteXml method it calls is public but not virtual.

     

    I could be wrong, but at this point I think that the GetChanges/ApplyChanges avenue is a dead-end.  I thinking I'm going to look into custom bindings and custom binding extensions to see how bindings like netTcpBinding use BinaryFormatter, and whether I can provide my own BinaryFormatter in some way.  That way I could give it a surrogate and force any DataSet serialized by it to use a DataSetSurrogate wrapper instead.

     

    David

    Wednesday, January 21, 2009 10:11 PM
  • I am seeing the same problem. I have ~ 24,000 rows in 4 tables, and it takes almost 6 minutes to perform a snapshot sync (n-tier, wcf uses nettcpBinding) .

    Not sure whether to try GZIP encoding, or DataSetSurrogate .If the problem is related to DataSet serialization taking alot of memory, then probably GZIP will not help much .

    Is the a sample showing sync & DataSetSurrogate?

    Thanks,Peter

     

     

     

     

    • Edited by peter clift Wednesday, May 6, 2009 11:20 PM improve
    Wednesday, May 6, 2009 10:41 PM

  • I am still waiting for a reply to my previous post.... (My product will ship soon)

    Thanks,Peter

    Thursday, May 28, 2009 3:38 PM
  • Hello,

    we have the same problem. Is there an example, which shows how to use Surrogate an Sync Framework, meantime?

    Thanks Boris
    Tuesday, June 30, 2009 1:01 PM
  • Hi, I'd love to know how to plumb the surrogate class into the sync framework. I haven't got a clue where to start. I'm having an absolute nightmare with sync services performance (sync'ing 80,000 rows and could watch the star wars trilogy in the time it takes to sync!!)
    Friday, July 10, 2009 5:29 PM
  • The Sync Develpement team is already aware of this issue and is working on the best effort to improve performance when synchronizing data over the network.

    Thanks.
    Leo Zhou ------ This posting is provided "AS IS" with no warranties, and confers no rights.
    Wednesday, July 29, 2009 8:25 PM
    Answerer
  • Hello Leo,

    when can we expect a new release date with this issue?
    Some of us need a better performance now and not in several months.

    So it would be interesting to know, because we have to make the decision to fix it by ourselves or waiting for your fix.


    Regards,

    Martin
    Thursday, July 30, 2009 8:46 AM
  • Hey Martin, I managed to get something similar working with gzip encoding, it definitely made things a bit faster over the wire and most of it was lifted straight from a msdn sample - http://msdn.microsoft.com/en-us/library/dd938879.aspx#UsingCompressiontoReduceDataTransferVolumesoverWCF (see also: http://social.microsoft.com/Forums/en-US/synctechnicaldiscussion/thread/faf2ddc2-a3c6-495f-9d04-0ac15dbc7f43)


    Thursday, July 30, 2009 7:08 PM
  • Hi SunHunter,

    thanks!

    Does it also work, if the Client is a .NET Compact Framework application?


    Regards,

    Martin
    Friday, July 31, 2009 8:00 AM
  • Yes, works fine with .NET CF. I'm using .NET CF 3.5 on client. The data transfer speeds are much better but there's still room for improvement when inserting/updating downloaded rows into the client database (ApplyingChanges takes an age) and I guess true binary serialization would improve transfer speeds still further (as mentioned by original poster). At least, it's a start ;) ...
    • Edited by SunHunter Friday, July 31, 2009 11:27 AM
    Friday, July 31, 2009 11:25 AM
  • Yes I realized that too.

    The INSERT/UPDATE statements of SQL CE are VERY poor.
    But I think this has nothing to do with the SyncFramework. It´s a general problem of .NET CF.

    Although I hope that Surrugates maybe help, because we can save Memory Storage.
    Friday, July 31, 2009 12:10 PM