locked
Batching - SQL Server Change Tracking RRS feed

  • Question

  • Hello,

    I am currently using MSF v2.0 with the offline scenario.  My server is using SQL Server 2008 with the DbServerSyncProvider, and my client is using SQL Server 2008 Express with the SqlExpressClientSyncProvider.  Both databases are using SQL Server Change Tracking.

    Everything is working fine, except I cannot get batching to work.  I am using the following article as a guideline:

    How to: Specify the Order and Batch Size of Changes:

    http://msdn.microsoft.com/en-us/library/bb902828(SQL.105).aspx

    The difference with the above example, is that I am using SQL Server Change Tracking and not Timestamp columns.  I modified the usp_GetNewBatchAnchor sproc to the following:

    Create PROCEDURE [sync].[usp_GetNewBatchAnchor]
    	@sync_initialized		bigint,
    	@sync_last_received_anchor bigint, 
    	@sync_batch_size bigint,
    	@sync_max_received_anchor bigint out,
    	@sync_new_received_anchor bigint out,            
    	@sync_batch_count int output           
    AS
    Begin        
           -- Set a default batch size if a valid one is not passed in.
           IF  @sync_batch_size IS NULL OR @sync_batch_size <= 0
    	     SET @sync_batch_size = 10    
    
    	   -- Before selecting the first batch of changes,
    	   -- set the maximum anchor value for this synchronization session.
           -- After the first time that this procedure is called, 
    	   -- Sync Framework passes a value for @sync_max_received_anchor
           -- to the procedure. Batches of changes are synchronized until this 
           -- value is reached.
           IF @sync_max_received_anchor IS NULL
             SELECT  @sync_max_received_anchor = change_tracking_current_version()
           
           -- If this is the first synchronization session for a database,
           -- get the lowest timestamp value from the tables. By default,
           -- Sync Framework uses a value of 0 for @sync_last_received_anchor
           -- on the first synchronization. If you do not set @sync_last_received_anchor,
           -- this can cause empty batches to be downloaded until the lowest
           -- timestamp value is reached.
           IF @sync_initialized = 0 Or @sync_last_received_anchor IS NULL OR @sync_last_received_anchor = 0
           BEGIN
                    
                    Select @sync_last_received_anchor = Max( CHANGE_TRACKING_MIN_VALID_VERSION(object_id) )
    		From	sys.change_tracking_tables
           
    		SET @sync_new_received_anchor = @sync_last_received_anchor + @sync_batch_size
    
    		-- Determine how many batches are required during the initial synchronization.
    		IF @sync_batch_count <= 0
    		  SET @sync_batch_count = ((@sync_max_received_anchor / @sync_batch_size) - (@sync_last_received_anchor /  @sync_batch_size))
    
    		END
    
           ELSE
           BEGIN
    
            SET @sync_new_received_anchor = @sync_last_received_anchor + @sync_batch_size
    
            -- Determine how many batches are required during subsequent synchronizations.
    		IF @sync_batch_count <= 0
              SET @sync_batch_count = ((@sync_max_received_anchor / @sync_batch_size) - (@sync_new_received_anchor /  @sync_batch_size)) + 1  
           
    	   END
    
           -- Check whether this is the last batch.      
           IF @sync_new_received_anchor >= @sync_max_received_anchor
           BEGIN
    
             SET @sync_new_received_anchor = @sync_max_received_anchor        
             IF @sync_batch_count <= 0
               SET @sync_batch_count = 1
    
           END
    End

    What am I missing?  I put a break point in the "DbServerSyncProvider.ChangesSelected" event, and it is attempting to return thousands of rows when I have set the "DbServerSyncProvider.BatchSize" to 10.

    Do I need to add/enable anything in the client code/sprocs?

    Thanks for your help,
    Garry
    Tuesday, February 9, 2010 5:36 AM

Answers

  • hi garry,

    i tried commenting out the if block on the incremental select  as follows and i can see the sync anchor being passed correctly.

    --IF @sync_initialized = 0
    --  SELECT Sales.Customer.[CustomerId], [CustomerName], [SalesPerson], [CustomerType] 
    --  FROM Sales.Customer LEFT OUTER JOIN 
    --  CHANGETABLE(CHANGES Sales.Customer, @sync_last_received_anchor) CT
    --  ON CT.[CustomerId] = Sales.Customer.[CustomerId]
    --ELSE
    --BEGIN
      SELECT Sales.Customer.[CustomerId], [CustomerName], [SalesPerson], [CustomerType]
      FROM Sales.Customer JOIN CHANGETABLE(CHANGES Sales.Customer, @sync_last_received_anchor) CT
      ON CT.[CustomerId] = Sales.Customer.[CustomerId]
      WHERE (CT.SYS_CHANGE_OPERATION = 'I' AND CT.SYS_CHANGE_CREATION_VERSION 
      <= @sync_new_received_anchor)
    --END

    of course you can roll your own batching solution if you want, i saw one where the solution was to use the batchsize on a Set Rowcount to force a x number of rows to be returned.

    • Marked as answer by Yunwen Bai Wednesday, February 17, 2010 7:54 AM
    Wednesday, February 10, 2010 9:48 PM
  • Hello JuneT,

    Sorry for the late response, I had to move to another project for a few days.

    I decided to implement my own custom mechanism that downloads the initial data from the database.  This custom code downloads the data and inserts a new record into my Anchor table with the "change_tracking_current_version" as the recieved anchor.  It is fairly simple, just a dynamic sproc on the server and some .net code on the client.

    I didn't go with the approach mentioned above as I don't see how it works.  The "SelectNewAnchor" sproc gets called once per session group.  If there are multiple tables in the group, then they can call have different "@sync_last_received_anchor" because some of the tables might have been cleaned up.

    Therefore, in each of the incremental select sprocs, I need to determine which received anchor should be used for the given table.  This would be fine if the received anchor was an output parameter and if "@sync_initialied" was always zero during the initialization session.  But as mentioned in my previous posted, the "@sync_initialized" parameter is only false for the first batch, all other batches have set this parameter to true.

    I really don't think that the batching mechanism for offline scenarios was made for initialization.  I think that this oversight was corrected in V2 for the collaboration scenario, and I will move to this scenario when collaboration supports SQL Server Change Tracking and filtering... hopefully that will be in the next version.

    Thanks again for your help,
    Garry
    • Marked as answer by Yunwen Bai Wednesday, February 17, 2010 7:54 AM
    Wednesday, February 17, 2010 12:12 AM

All replies

  • what was the object_id your passing here?

    Select @sync_last_received_anchor = Max( CHANGE_TRACKING_MIN_VALID_VERSION(object_id) )

    Tuesday, February 9, 2010 10:33 AM
  • The "object_id" is a column in the "sys.change_tracking_tables" table.  This table contains a list of all of the tables in the database that are configured for change tracking.

    All of the tables in my application that are configured for change tracking are being used by the Sync framework.

    I am using the MAX value from this query as the starting value for "sync_last_received_anchor" since this is the first time that the user is initializing and I can only return the data that hasn't been cleaned up by SQL Server Change Tracking.

    Thanks,
    Garry
    Tuesday, February 9, 2010 10:42 AM
  • i understand. but what is the exact object_id your passing? try executing this alone: Select Max( CHANGE_TRACKING_MIN_VALID_VERSION(object_id) )


    checkout http://msdn.microsoft.com/en-us/library/bb895183.aspx
    Tuesday, February 9, 2010 11:09 AM
  • I get 607, when I execute the following query:

    select

     

    Max( CHANGE_TRACKING_MIN_VALID_VERSION(object_id) ) from sys.change_tracking_tables

    I have 46 tables that have enabled change tracking.  They all have different min valid versions.

    I don't have an issue with the data is being returned, I have an issue with how much data is being returned at once.  I have configured the batch size to be 10 (for test purposes) and it is return thousands of rows, which is resulting in a WCF "QuotaExceededException" exception.  I have set "MaxReceivedMessageSize" to 10485760 (10MB), which is high enough for my application.  I want to set a resonable batch size, so that it will not exceed 10MB.

    Thanks,
    Garry
    Tuesday, February 9, 2010 11:20 AM
  • Hi Garry,

    I assumed you've set the SelectNewAnchorCommand  property to your stored proc, you might want to run sql profiler to trace the incremental selects being sent by SyncFx and confirm the values in the CHANGETABLE(CHANGES query.
    Tuesday, February 9, 2010 11:42 AM
  • Thanks for your help JuneT.

    Yes, I have set "SelectNewAnchorCommand" and the BatchSize property. Profiler recorded the following:
    (Ignore the sproc name, in my first post, I changed the name so that it was like the online example)


    declare @p4 bigint
    set @p4=642
    declare @p5 bigint
    set @p5=617
    declare @p6 int
    set @p6=4
    exec sync.uspAnchorSelectNewAnchor @sync_initialized=0,@sync_last_received_anchor=0,@sync_batch_size=10,@sync_max_received_anchor=@p4 output,@sync_new_received_anchor=@p5 output,@sync_batch_count=@p6 output
    select @p4, @p5, @p6
    GO
    
    declare @p4 bigint
    set @p4=642
    declare @p5 bigint
    set @p5=627
    declare @p6 int
    set @p6=4
    exec sync.uspAnchorSelectNewAnchor @sync_initialized=1,@sync_last_received_anchor=617,@sync_batch_size=10,@sync_max_received_anchor=@p4 output,@sync_new_received_anchor=@p5 output,@sync_batch_count=@p6 output
    select @p4, @p5, @p6
    GO
    
    declare @p4 bigint
    set @p4=642
    declare @p5 bigint
    set @p5=637
    declare @p6 int
    set @p6=4
    exec sync.uspAnchorSelectNewAnchor @sync_initialized=1,@sync_last_received_anchor=627,@sync_batch_size=10,@sync_max_received_anchor=@p4 output,@sync_new_received_anchor=@p5 output,@sync_batch_count=@p6 output
    select @p4, @p5, @p6
    GO
    
    declare @p4 bigint
    set @p4=642
    declare @p5 bigint
    set @p5=642
    declare @p6 int
    set @p6=4
    exec sync.uspAnchorSelectNewAnchor @sync_initialized=1,@sync_last_received_anchor=637,@sync_batch_size=10,@sync_max_received_anchor=@p4 output,@sync_new_received_anchor=@p5 output,@sync_batch_count=@p6 output
    select @p4, @p5, @p6
    GO
    
    
    declare @p4 bigint
    set @p4=642
    declare @p5 bigint
    set @p5=617
    declare @p6 int
    set @p6=4
    exec sync.uspAnchorSelectNewAnchor @sync_initialized=0,@sync_last_received_anchor=0,@sync_batch_size=10,@sync_max_received_anchor=@p4 output,@sync_new_received_anchor=@p5 output,@sync_batch_count=@p6 output
    select @p4, @p5, @p6
    GO

    The sproc was called 5 times.  I have two SyncGroups.  I the first 4 SelectNewAnchor calls are for the first SyncGroup (all my lookup tables, not a lot of data).  The first SyncGroup is being downloaded and insert into my client database.  The last SelectNewAnchor is for the SyncGroup that contains around 35 tables.  These tables hold a lot of data.

    The last SelectNewAnchor is return:
    @sync_max_received_anchor = 642
    @sync_new_received_anchor = 617
    @sync_batch_count = 4

    These values look correct to me.

    After the last SelectNewAnchor call, it calls the "SelectIncrementalInserts/Updates/Deletes" for all of the 35 tables in my SyncGroup.  Then I get the WCF error stating that the data has exceed the 10MB that I have configured.  This can't be possible if the batch size is set to 10.  10 rows can't contain more than 10MB of data. Any ideas?

    Thanks for all of your help,
    Garry
    Tuesday, February 9, 2010 12:12 PM
  • i wouldnt be surprised with the data size if the changes are transmitted as datasets. (unless you did it differently with your WCF implementation)

    but If you just passed on the synccontext and if you trace it you'll find there are two datasets (one in in the synccontext, another one in groupprogress) and in each dataset, each row has two copies (a diff if am not mistaken). so for each row you have, think having 4 copies being transmitted in xml (imagine all those tags surrounding your data). on the return from calling applychanges on the server side, you'll have more data in the dataset if a conflict happens. so its the synccontext dataset plus the groupprogress dataset plus the conflict information on the return.

    also, if am not mistaken, the batchsize is applied to each of the incremental select statements. i maybe wrong, but i dont think the batchsize returns only 10 rows for each table in your syncgroup, rather, its 10 rows for each of the select (10 for insert, 10 for update, 10 for delete). you can confirm by capturing a sql profiler trace of the incremental selects. (the last sync proj i worked with has a custom batching and am not sure if the one i was tracing was the custom selects or the built-in ones from syncfx)


    35 tables seems to be a lot in a single syncgroup. try breaking up into smaller syncgroups if you can.
    Tuesday, February 9, 2010 4:27 PM
  • Hello JuneT,

    Yes, my incremental select statements return all rows when the user is first initializing.  Do I need to modify the incremental select statements for batching?  Below is an example of one of my incremental select statements.  I used the base template that is in the online examples.  As you can see, there isn't anything in this code that filters the results when the user is first synchronizing, therefore all rows are returned.  Can you please give me an example of how the incremental select statement should be coded for both initialized and uninitialized:

    ALTER PROCEDURE [sync].[uspClientSelectIncrementalInserts]
    	@sync_table_name		nvarchar(100),
    	@sync_initialized		bigint,
    	@sync_last_received_anchor	bigint,
    	@sync_client_id_binary		binary,
    	@sync_new_received_anchor	bigint
    AS
    BEGIN
    
    SET NOCOUNT ON;
    
        IF @sync_initialized = 0 
        Begin
    	-- Return all rows since this is the first time that the client is synchronizing.
    	SELECT	
    		dbo.Client.ClientId,
    		dbo.Client.ClientName		
    	FROM	
    		dbo.Client
        End
        Else
        Begin
    	
    	SELECT	
    		dbo.Client.ClientId,
    		dbo.Client.ClientName
    	FROM	
    		dbo.Client 
    		JOIN CHANGETABLE( CHANGES dbo.Client, @sync_last_received_anchor ) CT ON CT.ClientId = dbo.Client.ClientId
    	WHERE 
    		(	CT.SYS_CHANGE_OPERATION = 'I'
    		    AND CT.SYS_CHANGE_CREATION_VERSION  <= @sync_new_received_anchor
    		    AND (	CT.SYS_CHANGE_CONTEXT IS NULL 
    			    OR	CT.SYS_CHANGE_CONTEXT <> @sync_client_id_binary
    			)
    		)
    		
    	IF CHANGE_TRACKING_MIN_VALID_VERSION( object_id( @sync_table_name ) ) > @sync_last_received_anchor
    	Begin
    	    RAISERROR ( N'SQL Server Change Tracking has cleaned up tracking information for table ''%s''. To recover from this error, the client must reinitialize its local database and try again', 16, 3, @sync_table_name ) 
    	End
        END
        
    END

    Thanks,
    Garry
    Wednesday, February 10, 2010 1:20 AM
  • have you checked this? http://msdn.microsoft.com/en-us/library/cc305322.aspx

    IF @sync_initialized = 0
      SELECT Sales.Customer.[CustomerId], [CustomerName], [SalesPerson], [CustomerType] 
      FROM Sales.Customer LEFT OUTER JOIN 
      CHANGETABLE(CHANGES Sales.Customer, @sync_last_received_anchor) CT
      ON CT.[CustomerId] = Sales.Customer.[CustomerId]
    ELSE
    BEGIN
      SELECT Sales.Customer.[CustomerId], [CustomerName], [SalesPerson], [CustomerType]
      FROM Sales.Customer JOIN CHANGETABLE(CHANGES Sales.Customer, @sync_last_received_anchor) CT
      ON CT.[CustomerId] = Sales.Customer.[CustomerId]
      WHERE (CT.SYS_CHANGE_OPERATION = 'I' AND CT.SYS_CHANGE_CREATION_VERSION 
      <= @sync_new_received_anchor)
    END
    Wednesday, February 10, 2010 2:08 AM
  • Yes, I've seen that article, however this doesn't solve the initialization issue.

    As you can see, if this is the first time the user is synchronizing (@sync_initialized = 0), then all rows are returned from the database. This select statement doesn't consider the batch size that was configured.  Therefore, if I have a table with 100,000 rows, it will return 100,000 rows.

    This is my issue.  I have a large table and it is returning all of the rows when the user first initializes.  This is causing the WCF "QuotaExceededException" exception.  I want this table to be returned in batches, to reduce the size of the data being returned.

    Isn't there a way to get batching to work for initialization?

    Thanks,
    Garry
    Wednesday, February 10, 2010 2:36 AM
  • hi garry,

    can you try taking out the if @sync_initialized block and just retain the code on the else block.
    Wednesday, February 10, 2010 1:00 PM
  • Hello JuneT,

    I took out the If block and it wasn't calculating the anchors correctly.  I also removed the @sync_initialized from the IF statement, which didn't make a difference because @sync_initialized is always zero when the @sync_last_received_anchor is zero.

    From what I have gathered, the "Offline" synchronization mechanism doesn't support batching when first initializing.  It only supports batching when querying for database changes after the client has initialized.

    I ran profiler while my client application initialized.  The "change_tracking_current_version" was 400 and the "CHANGE_TRACKING_MIN_VALID_VERSION" for my table was 300.  I set the batch size to be 10.

    All of the parameters were calculated correctly in the "SelectNewAnchorCommand". 

    The profiler trace looked like this

    @sync_batch_count = 10, @sync_last_received_anchor = 0, @sync_new_received_anchor = 310, @sync_max_received_anchor = 400, @sync_initialized = 0
    @sync_batch_count = 10, @sync_last_received_anchor = 310, @sync_new_received_anchor = 320, @sync_max_received_anchor = 400, @sync_initialized = 1
    @sync_batch_count = 10, @sync_last_received_anchor = 320, @sync_new_received_anchor = 330, @sync_max_received_anchor = 400, @sync_initialized = 1
    @sync_batch_count = 10, @sync_last_received_anchor = 330, @sync_new_received_anchor = 340, @sync_max_received_anchor = 400, @sync_initialized = 1
    ... and so on.

    As you can see, after the first batch has been processed, all subsequent batches have the sync_initialized = 1.

    So, on the first batch, it passes in @sync_initialized = 0 to the "SelectIncremental*" sprocs.  This causes the entire table to be returned, which is my issue, as it is return too much data.  I was thinking that I could modify the "SelectIncremental*" to only return X number of rows because on the batch size, but then I saw that on the subsequent batches, the @sync_initialized is no longer 0, even thou it is still in the same sync session.

    I am guessing this is by design and that "Offline" synchronization doesn't support batching.  Do you know if this is true?

    The only thing that I can think of to work around this issue is to create my own custom sync framework to load the data in batches when the application is first launched (i.e. initialized) and then use the MSF batching for all syncs afterwards.

    Any ideas?

    Thanks,
    Garry
    Wednesday, February 10, 2010 1:41 PM
  • hi garry,

    i tried commenting out the if block on the incremental select  as follows and i can see the sync anchor being passed correctly.

    --IF @sync_initialized = 0
    --  SELECT Sales.Customer.[CustomerId], [CustomerName], [SalesPerson], [CustomerType] 
    --  FROM Sales.Customer LEFT OUTER JOIN 
    --  CHANGETABLE(CHANGES Sales.Customer, @sync_last_received_anchor) CT
    --  ON CT.[CustomerId] = Sales.Customer.[CustomerId]
    --ELSE
    --BEGIN
      SELECT Sales.Customer.[CustomerId], [CustomerName], [SalesPerson], [CustomerType]
      FROM Sales.Customer JOIN CHANGETABLE(CHANGES Sales.Customer, @sync_last_received_anchor) CT
      ON CT.[CustomerId] = Sales.Customer.[CustomerId]
      WHERE (CT.SYS_CHANGE_OPERATION = 'I' AND CT.SYS_CHANGE_CREATION_VERSION 
      <= @sync_new_received_anchor)
    --END

    of course you can roll your own batching solution if you want, i saw one where the solution was to use the batchsize on a Set Rowcount to force a x number of rows to be returned.

    • Marked as answer by Yunwen Bai Wednesday, February 17, 2010 7:54 AM
    Wednesday, February 10, 2010 9:48 PM
  • Hello JuneT,

    Sorry for the late response, I had to move to another project for a few days.

    I decided to implement my own custom mechanism that downloads the initial data from the database.  This custom code downloads the data and inserts a new record into my Anchor table with the "change_tracking_current_version" as the recieved anchor.  It is fairly simple, just a dynamic sproc on the server and some .net code on the client.

    I didn't go with the approach mentioned above as I don't see how it works.  The "SelectNewAnchor" sproc gets called once per session group.  If there are multiple tables in the group, then they can call have different "@sync_last_received_anchor" because some of the tables might have been cleaned up.

    Therefore, in each of the incremental select sprocs, I need to determine which received anchor should be used for the given table.  This would be fine if the received anchor was an output parameter and if "@sync_initialied" was always zero during the initialization session.  But as mentioned in my previous posted, the "@sync_initialized" parameter is only false for the first batch, all other batches have set this parameter to true.

    I really don't think that the batching mechanism for offline scenarios was made for initialization.  I think that this oversight was corrected in V2 for the collaboration scenario, and I will move to this scenario when collaboration supports SQL Server Change Tracking and filtering... hopefully that will be in the next version.

    Thanks again for your help,
    Garry
    • Marked as answer by Yunwen Bai Wednesday, February 17, 2010 7:54 AM
    Wednesday, February 17, 2010 12:12 AM
  • glad you had it working. It's actually easier doing it "outside" of SyncFx. I did have a similar approach to SyncFx V1. a separate call to the server db to determine the lowest timestamps for each table and then saving them as the starting anchor in the client side. 
    Wednesday, February 17, 2010 12:49 AM
  • Glad to hear it worked with your own logic. JuneT's suggest should work as it treats the inital sync and subsequent sync with the same select query. the trick here is the V1 batching is actualy batch with the "anchor values", not the base table rows, so with inital sync, you will need to "arbitrarily" set the anchor values ( e.g. dummy updates rows in differnt transactions ) to force the batching logic to work.

    the batching was totaly re-design in V2 and it use the memory size as the batch size so it will be based on data. I would recommend to give it a try -- however, you will not be able to use sql change tracking as it is not currently supported by V2 collabration providers.

    thanks
    Yunwen


    This posting is provided "AS IS" with no warranties, and confers no rights.
    Wednesday, February 17, 2010 7:54 AM
  • I know it's been a 4 years, but would you happen to have any code for your custom solution you could share?

    I would like to get Batching working with Sql Change Tracking with Sync 2.1(using 2.0 objects) from the information I've gathered this seems impossible.
    Tuesday, March 4, 2014 3:37 PM