none
How to avoid empty batches when syncing filtered data? RRS feed

  • Question

  • I´ve got a solution where I sync a filtered set of database tables to a Windows Mobile 6 device. The data is partitioned on the device so as to reduce the volume of data to be transported to the client as well as for security reasons.

    So far, I have got away without using batching, until last week, when some clients began having out of memory issues because the volume of data in the system is increasing and when data is arriving on the device it is too much for the client device to handle. So, now I *have* to implement batching because the system is unusable unless I do this.

    The problem is, I have attempted to implement batching before, but have failed because the data is filtered and I got a serious performance problem with lots of empty batches because the data is not stored in a contiguous fashion in the database table and I´m not sure how to implement *efficient* batching logic using filtered data.

    All of the batching examples I´ve seen use timestamp values and a very crude  SET @sync_new_received_anchor = @sync_last_received_anchor + @sync_batch_size to calculate the next anchor value to use. However, this calculation assumes contiguous data and is not valid if data is filtered; and in my case, if you are only downloading less than 1% of the database table it leads to unacceptable performance (sync taking *hours* to download instead of a few minutes due to thousands of empty batches).

    I just wondered if anybody has experienced and gotten around this issue. I'm assuming I cannot be the only person to have experienced this issue, particularly with the out of memory problem that needs to be gotten around somehow.
    Saturday, January 2, 2010 12:52 PM

Answers

  • this is a valid issue, and at the same time, a known issue for the sync servie v1 batching support. as you also mentioned, the v1 batching assumes continous data changes associated with the anchor values, instead of the data values in the base table.

    the poor performance for the relatedly larger amount of empty sync-es are mainly from client ( especially on device ) and transport.

    in terms of workaround, a few things needs to be considered:
    1. how do you group the tables being sync-ed, remember the anchors are committed at the sync group level
    2. is the same server provider instance being reused at the mid tier, since the get_newAnchor command is at the provider level
    3. related with #2, how likely the batching can be dynamically set for each sync sessions at the mid tier.

    I think you are on the right direction to workaround this by determine what is the "valid" next anchor for this particular client. besides what your proposed, a similar one could be ( logically ):

    1. add new web service method at mid tier to peak the next anchor for the client -- this should be very similar ( if not exact the same ) as your query above.
    2. baed on the value "peeked" above, set the table received anchor at the client as peekedValue - batchsize.
    3. do actual sync

    this way, can reduce the empty syncs due to "no changes for this anchor" to 1 at most, and avoid to maintain the anchorValue table described above.

    ( note, I rushed to share out this thoughts without actually implement it so please do test it to ensure this will fit your needs ). I will be glad we can discuss more options for this issue.

    thanks
    Yunwen


    This posting is provided "AS IS" with no warranties, and confers no rights.
    Thursday, March 18, 2010 9:09 AM
    Moderator

All replies

  • -- Create a table of modified/created timestamps for this sync session since the last @sync_last_received_anchor 
    INSERT INTO zTimeStamps
    	SELECT a.[TimeStampCreated] AS TimeStampCreatedUpdated, @sync_session_id
    		FROM Areas a 
                                    JOIN Blocks b ON b.BlockID = a.BlockID
                                    JOIN Estates e ON e.EstateID = b.EstateID
    
    		WHERE 
    			e.ContractID = @ContractID AND 
    				e.EstateID = CASE WHEN @EstateID > 0 THEN @EstateID ELSE e.EstateID END AND 
    				(a.[TimeStampCreated] > @sync_last_received_anchor AND 
    						a.TimeStampCreated > @sync_last_received_anchor)
    
    	UNION
    
    	SELECT a.TimeStampUpdated, @sync_session_id
    		FROM Areas a 
                                    JOIN Blocks b ON b.BlockID = alias.BlockID
                                    JOIN Estates e ON e.EstateID = b.EstateID
    		WHERE  
    			e.ContractID = @ContractID AND 
    				e.EstateID = CASE WHEN @EstateID > 0 THEN @EstateID ELSE e.EstateID END AND 
    				(a.[TimeStampCreated] > @sync_last_received_anchor AND 
    					a.TimeStampUpdated > @sync_last_received_anchor)
    
    	-- TODO: UNION created/modified timestamps for all other sync tables (if absolutely needed)

     ORDER BY TimeStampCreatedUpdated SET @sync_max_received_anchor = (SELECT MAX(TimeStampCreatedUpdated) FROM zTimeStamps WHERE syncSessionId = @sync_session_id) -- Determine if any gaps in the timestamp sequence to avoid empty batches SET @sync_new_received_anchor = spGetNextBatchAnchor(@sync_last_received_anchor, @sync_batch_size, @sync_session_id) -- Pseudo-code for spGetNextBatchAnchor CREATE PROC spGetNextBatchAnchor ( @sync_last_received_anchor TIMESTAMP, @sync_batch_size INT, @sync_session_id ) AS DECLARE @sync_new_received_anchor TIMESTAMP -- Look up the calculated timestamp corresponding to @sync_last_received_anchor + @sync_batch_size (for this @sync_session_id) -- If a timestamp exists, everything ok, return the calculated timestamp -- If a timestamp does not exist for the calculated timestamp, find/return the next highest timestamp in the table of available TimeStamps for this client RETURN @sync_new_received_anchor
    Above is the general approach that I´m planning on taking to get around the issue with empty batches.

    1. Build a table (zTimestamps) containing all modified and created timestamps since @sync_last_received_anchor. This is quite an expensive operation and I have a couple of questions here: is it necessary to build a list of timestamps for *all* tables in all sync groups, or is it possible to pass in the name of the table or sync group that is currently being sync´d so that the list of timestamps is relevant for just the table about to be sync´d? I´d like the bit of code which builds the timestamp table to run just once for each sync, so I would check the @sync_session_id to see if the table exists or not.

    2. Instead of simply adding the batch size to the @sync_last_received_anchor to obtain the value for @sync_new_received_anchor and perhaps returning an anchor which would contain no data (and an empty batch), instead I would look up the anchor in the zTimestamps table to see if data with this timestamp exists. If there is no record containing this timestamp, return the next highest timestamp for this client so that the next batch will contain some data.

    Thoughts with this approach? A bit of a hack, but the only way I can think of getting around the issue with empty batches.

    Saturday, January 2, 2010 4:00 PM
  • this is a valid issue, and at the same time, a known issue for the sync servie v1 batching support. as you also mentioned, the v1 batching assumes continous data changes associated with the anchor values, instead of the data values in the base table.

    the poor performance for the relatedly larger amount of empty sync-es are mainly from client ( especially on device ) and transport.

    in terms of workaround, a few things needs to be considered:
    1. how do you group the tables being sync-ed, remember the anchors are committed at the sync group level
    2. is the same server provider instance being reused at the mid tier, since the get_newAnchor command is at the provider level
    3. related with #2, how likely the batching can be dynamically set for each sync sessions at the mid tier.

    I think you are on the right direction to workaround this by determine what is the "valid" next anchor for this particular client. besides what your proposed, a similar one could be ( logically ):

    1. add new web service method at mid tier to peak the next anchor for the client -- this should be very similar ( if not exact the same ) as your query above.
    2. baed on the value "peeked" above, set the table received anchor at the client as peekedValue - batchsize.
    3. do actual sync

    this way, can reduce the empty syncs due to "no changes for this anchor" to 1 at most, and avoid to maintain the anchorValue table described above.

    ( note, I rushed to share out this thoughts without actually implement it so please do test it to ensure this will fit your needs ). I will be glad we can discuss more options for this issue.

    thanks
    Yunwen


    This posting is provided "AS IS" with no warranties, and confers no rights.
    Thursday, March 18, 2010 9:09 AM
    Moderator
  • i think we have a similar discussion on empty batches on this thread which you might find useful: http://social.microsoft.com/Forums/en-US/syncdevdiscussions/thread/08e6f9dd-bc48-4742-9cad-aea0290745b2/


    the idea was to check in the get new batch anchor SP if the new batch anchor returns rows, if it doesnt, you grab a new batch number until such time you get a non-empty batch.
    Thursday, March 18, 2010 1:33 PM
    Moderator