locked
Avoiding Empty Batches RRS feed

  • Question

  • I have a question along the lines of this thread: http://social.microsoft.com/forums/en-us/syncdevdiscussions/thread/F841376A-3349-42F6-BA2F-F1CF8DB9BDDD

    I have a solution using v1.0 because there is a mobile app involved.  We use change tracking in SQL Server 2008 and a WCF service that the client apps talk to.  The data is logically partitioned among different clients.  With a lot of changes, we can see lots of empty or nearly empty batches and therefore lots of round trips when I could probably get away with a couple.  This is because we may have updated 10000 rows, but only every 100th update is for a particular client app.  For example, we have 10000 customers and calculate the total sales for each customer nightly, but each customer representative only cares about his 100 customers.  So if batch size is set to 1000 rows max, we have to do 10 batches to get our 100 updates, but really we could have just done one batch.

    One thought I had was to serialize out the SyncContext object that gets returned from the GetChanges method and if it is bigger than my WCF return size then I write it out to disk and return it to the client in chunks of xml (i.e. divide up the xml string into the size that I allow for return).  So do something like have the proxy that calls the WCF service check the result and see if it needs to query SyncContext in chunks and go get it and piece it together and deserialize it and then let it all flow the way it should.  Or just serialize out the dataset or something like that.  Is that what v2 does for batching to control by datasize?  Any other work around suggestions?

    I also thought about dividing groups of tables across different sync calls, but that would require at least one round trip for each group of tables which would be bad if I could have only needed one if there were little to no changes.

    Has anyone had other thoughts on working around this issue?

    Wednesday, December 15, 2010 2:50 AM

Answers

  • I ended up doing something like this where I clone the dataset and send it to the client in chunks.  The client proxy knows how to come back for more and piece it together before letting the framework take over.  It seems to work and avoids round trips for empty or partially empty batches.
    Thursday, January 13, 2011 8:16 PM

All replies

  • i'm assuming you have implemented your own custom batching similar to the post you referred above.

    Empty batches are a result of the incremental changes select, so you can address it in the query itself similar to the post you referred to. Serializing the SyncContext doesnt solve your empty batches issue.

    if you're using WCF, the SyncContext is actually serialized already. If you do your custom serialization, you'll have to deal with chunking the dataset, reassembling on the other side, converting to dataset, cleaning up temporary files, etc...

    In the collaboration providers, the batching is implemented via DataSetSurrogates (think dataset converted to binary/byte array, sent by chunks or spooled as files and converted back to dataset on the receiving side and applied as a single transaction). it's meant to minimize sending the dataset in one go while at the same time minimize the payload by not serializing the dataset as xml.

    Wednesday, December 15, 2010 5:21 AM
  • I don't have custom batching, I just followed the standard batching per the msdn articles.   I wrote a custom proc for the get anchor methods which sets my anchors for the batching (i.e. get from change tracking anchor 10930 to 20930). Then the framework handles calling my getXXXIncrementalInserts, getXXXIncrementalUpdates, getXXXIncrementalDeletes for each table (where XXX is my various tables like Customer, Order).  I'm not sure how I could address it in the getXXX queries if I have 20 or 30 tables that get synced.  So within the anchors (i.e. between 10930 and 20930), one table may have 0 changes, one may have a few, one may have 1000. Would I somehow have an opportunity to change anchors based on how many rows were returned?  Would I have to try the first batch of anchors, get few results and try to turn around and get more changes by recalling the getchanges on my serverprovider or something?  I'm not sure how I would do that.

    And yes - the serializing or converting the dataset to byte array and chunking as a download or something and piecing it together on the client side is what I was thinking.  Not trivial, i know, but it gets rid of any batching by getting everything and then chunking the data as needed.

    Please reply as I'd appreciate any thoughts. 

    Wednesday, December 15, 2010 2:14 PM
  • Another thought is to not do batching at the procs, so just get everything and then instead of serializing out the dataset manually, just check row counts and if the total rows are over my threshold, remove rows into a temp dataset that i save to memory (i think that would work if it was in a static property) or maybe I still have to serialize them out to disk or database, but what I'm saving off now is just a full dataset that has a portion of the changes.  Then I could have my proxy class check an indicator in the result (maybe using a sync session parameter) to know to just recall passing the parameter back up.  The proxy would reconstruct the full dataset on the client before letting the sync framework pieces take over again.  So similar to the above thought, but now just trimming the dataset so that it can be returned over the same WCF interface instead of as a byte stream or something.
    Thursday, December 16, 2010 12:08 AM
  • Still working through this to see if it will work, but stubbing it out, my class that derives from DbServerSyncProvider would look like this.  A different method would be used to get the subsequent batched datasets and the rows would be pieced back together on the client side before proceeding with applying the changes.  The reason I think I need to send the datasets is because we expect the timezone adjustment to occur over the wire for datetime columns.  If we just serialized out the dataset and downloaded as bytes it would be off (we doe something to adjust timezones in the download of the initial snapshot, but it would be nice to not have to do that here.  Any thoughts on this route?

        public override SyncContext GetChanges(SyncGroupMetadata groupMetadata, SyncSession syncSession)
        {
          try
          {
            SyncContext syncContext = base.GetChanges(groupMetadata, syncSession);
    
            // begin test code
    
            DataSet dsCurrentBatch = syncContext.DataSet.Clone();;
            DataSet dsFirstBatch = dsCurrentBatch;
    
            bool batchesRequired = false;
            int maxRowCount = 2000;
            int rowCount = 0;
    
            foreach (DataTable table in syncContext.DataSet.Tables)
            {
              foreach (DataRow row in table.Rows)
              {
                rowCount++;
                if (rowCount % maxRowCount == 0)
                {
                  dsCurrentBatch = syncContext.DataSet.Clone();
                  string dsID = Guid.NewGuid().ToString();
                  if(syncSession.SyncParameters["BatchedDataSetKeys"] == null)
                  {
                    syncSession.SyncParameters.Add("BatchedDataSetKeys", string.Empty);
                  }
                  if(syncSession.SyncParameters["BatchedDataSetKeys"].ToString() != string.Empty)
                  {
                    syncSession.SyncParameters["BatchedDataSetKeys"].Value = syncSession.SyncParameters["BatchedDataSetKeys"].Value.ToString() + "|";
                  }
                  syncSession.SyncParameters["BatchedDataSetKeys"].Value = syncSession.SyncParameters["BatchedDataSetKeys"].Value.ToString() + dsID;
                  _batchedDataSets.Add(dsID, dsCurrentBatch);
                  batchesRequired = true;
                }
                if (rowCount > maxRowCount)
                {
                  // TODO: confirm rowstate is okay
                  dsCurrentBatch.Tables[table.TableName].ImportRow(row);
                }
              }
            }
    
            if(batchesRequired)
            {
              // replace the dataset in the context with the first
              syncContext.DataSet = dsFirstBatch;
            }
    
            // end test code
    
            return syncContext;
    

     

     

     

    Thursday, December 16, 2010 4:46 AM
  • I ended up doing something like this where I clone the dataset and send it to the client in chunks.  The client proxy knows how to come back for more and piece it together before letting the framework take over.  It seems to work and avoids round trips for empty or partially empty batches.
    Thursday, January 13, 2011 8:16 PM