locked
SQL Server Change Tracking and Batching RRS feed

  • General discussion

  • After a good amount of investigation, our application is applying batching to a database synchronized using SQL Server change tracking.  I found some limitations to the sync framework while getting this up and running.  I am posting my findings here in hope that Microsoft will add new features to the Synch Framework in the next release.  This would help me to make my code less complicated and more maintainable.

     

    First issue which exists with SQL Change tracking and batching is finding a proper segmentation point for the batches.  Using the procedures created for us by the synch adapters, you will wind up with a statement which segments batches based on change tracking version numbers.  In my testing this was unacceptable and unpredictable.  SQL Server change tracking assigns version numbers based on TRANSACTIONS it would seem and not raw INSERTS, UPDATES, DELETES.  So therefore, if your objective is to segment over version number, this will be impossible for example if many of your inserts occur inside a single transaction; all of the version numbers would be exactly the same giving nothing to segment on.  I overcame this issue by staging by batches using the new ROW_NUMBER function.  This greatly improved performance.  I no longer had empty batches or partially filled batches. 

     

    My main limitation getting this all working was the batch anchor command.  I needed to send a few more parameters to it.   I expected my custom session parameters to be available here, however they were not.  I ended up getting around this by caching the paramaters I needed using a WCF call prior to calling Syncronize and then picking up the parameters via the session and client ids.  I think it would be nice if anchor commands supported custom parameters.

     

    Another issue you should be aware of is that all the items you wish to batch will not have tracking info on the initial synch if cleanup has ever occurred on the database.  Therefore, it is necessary to batch the initial INSERT synch using something all together different than tracking version numbers.  Again, ROW_NUMBER worked nicely for this purpose.

     

     

    • Moved by Hengzhe Li Friday, April 22, 2011 3:10 AM (From:SyncFx - Microsoft Sync Framework Database Providers [ReadOnly])
    Wednesday, January 21, 2009 7:58 PM

All replies

  • I am in a similar situation, almost at the point of 86'ing batching all together. 1 out of every 10 or so cases get an OOM exception, which is forcing me to keep struggling with batching. My problem lies in empty batches, slow performance, etc. I believe the root of the problem is a poor implementation of getnewbatchanhor(). How did you implement your version of batching using the row_number command? I am somewhat new to sql 2008, and I am sure there are a lot of features I am not taking full advantage of.

    Thanks in advance.
    Tuesday, March 3, 2009 4:17 PM
  •  Sorry for the delay in reply.  Here is what I ended up doing. 

    Everything starts at the client in this manner...

    1. Make a middle tier call on the client before calling the Synchronize method.  The call does the following, please note most of this work is simply done because the batch anchor command can not read session variables otherwise I would have done this much more efficiently.
      1. Clean up any previous cached parameters for this client which may still be in memory from a previous operation.
      2. Under the context of the SqlCeClientSyncProvider.ClientID upload a list of tables and filtering parameters which I shall use for the sync operation.
    2.  Make a call to DataSyncAgent.Synchronize

    The server is now in control...

    • In my anchor creation procedure I "Generate a SYNCH Plan."  Essentially what I am doing here is pre-determining how much data the client is actually requesting rather than relying on anchor points the guesstimate this.  It is my optinion that it is much easier for me to upgrade my server and assume this overhead rather than ask the client to upgrade to a T1 in order to download numerous of empty batches.  The largest problem is batching data to a new client database.  I'll go over this first and then cover batching subsequent synchs later when I have more time.
    --GENERATE A SYNCH PLAN  
                  
                INSERT INTO @Tables  
                SELECT   
                sysobjects.[name],    
                sysobjects.[id]  
                FROM 
                @SessionTables as S  
                INNER JOIN sysobjects ON sysobjects.[name] = S.[name]  
                INNER JOIN sys.change_tracking_tables as T ON sysobjects.[id] = T.[object_id]  
                WHERE   
                sysobjects.xtype = 'U'    
                GROUP BY sysobjects.[name], sysobjects.[id]  
                  
                SET @DBCount = 0  
                DECLARE tableCursor CURSOR FOR   
                SELECT   
                [name],    
                [id]  
                FROM @Tables  
                  
                  
                DECLARE @COUNTTABLE TABLE (rowNumber int)  
                  
                OPEN tableCursor  
                FETCH NEXT FROM tableCursor INTO @name, @id  
                WHILE @@FETCH_STATUS = 0  
                BEGIN 
                        SET @procName = 'usp_synch_' + @name + '_GetInserts' 
                        DELETE FROM @COUNTTABLE  
                          
                        BEGIN TRAN  
                        INSERT INTO @COUNTTABLE   
                        EXEC @procName   
                        '',  
                        @StartDate,  
                        @EndDate,  
                        0,  
                        -1,  
                        @batchAnchor,  
                        @sync_client_id_binary,  
                        @sync_client_id,  
                        null,  
                        null,  
                        1,  
                        @sync_group_name  
                        COMMIT TRAN  
                          
                          
     
                        --SET @tableCount = @@ROWCOUNT  
                        SELECT @tableCount = COUNT(rowNumber) FROM @COUNTTABLE  
                          
                          
                            INSERT INTO SynchPlans (syncSessionId, syncClientId, sessionAnchor, sessionLastAnchor, name, TableRows, DBRows, syncInitialized, batchName)  
                            VALUES (@sync_session_id,@sync_client_id,@batchAnchor, 0, @name,@tableCount,@DBCount,0, @sync_group_name)  
                          
                        SET @DBCount = @DBCount + @tableCount  
                          
                    FETCH NEXT FROM tableCursor INTO @name, @id  
                END 
                CLOSE tableCursor  
                DEALLOCATE tableCursor  
                  
                --DETERMINE TOTAL ROWS IN PLAN  
                SELECT @sync_max_received_anchor = MAX(DBRows) FROM SynchPlans WHERE syncSessionId = @sync_session_id  
            END 

    • As you can see the procedure is building a table called SynchPlans which is recording the number of rows to obtain in the table and the position of those rows relative to the batch ie @DBCount.  Next the anchor command uses this info to size the batch count and distribute the data evenly.
        IF @sync_last_received_anchor is null or @sync_last_received_anchor = 0  
            BEGIN 
                SET @sync_last_received_anchor = 1  
                SET @sync_new_received_anchor = @sync_last_received_anchor + @sync_batch_size  
                IF @sync_batch_count <= 0  
                    SET @sync_batch_count = ((@sync_max_received_anchor / @sync_batch_size) - (@sync_last_received_anchor / @sync_batch_size)) + 1  
            END 
        ELSE 
            BEGIN 
                SET @sync_new_received_anchor = @sync_last_received_anchor + @sync_batch_size  
                  
                IF @sync_batch_count <= 0  
                    SET @sync_batch_count = ((@sync_max_received_anchor / @sync_batch_size) - (@sync_last_received_anchor / @sync_batch_size)) + 1  
            END 
              
        IF @sync_new_received_anchor >= @sync_max_received_anchor  
        BEGIN 
            SELECT @sync_new_received_anchor = MIN(sessionAnchor) FROM SynchPlans WHERE syncSessionId = @sync_session_id     
            IF @sync_batch_count <= 0  
            SET @sync_batch_count = 0  
     
        END 
    •  What is important to note above is that the sync_new_received_anchor is initially incremented only by the batch size.  Once the new_anchor is > max_anchor different logic is applied by running some SQL to make sure the client records the correct anchor.  The actual insert procedures themselves use ROW_NUMBER to segment the batch data.
    IF EXISTS(  
            SELECT SP.[nameFROM SynchPlans SP   
            WHERE SP.name = 'Table'   
            AND SP.syncSessionID = @sync_session_id  
            AND SP.DBRows <= @sync_new_received_anchor  
            AND SP.DBRows + SP.TableRows > @sync_last_received_anchor  
            )   
                BEGIN 
                    DECLARE @anchor bigint 
                    SELECT @anchor = sessionAnchor FROM SynchPlans SP WHERE SP.name = 'Table' AND SP.syncSessionID = @sync_session_id  
                      
                    SELECT 
                    [column]  
                    FROM 
                    (  
                        SELECT 
                        [column],  
                        ROW_NUMBER() OVER (ORDER BY [PK]) + SP.DBRows AS DBROWID  
                        FROM 
                        Table 
                        INNER JOIN SynchPlans SP ON SP.name = 'Table' AND SP.syncSessionID = @sync_session_id  
                        LEFT JOIN 
                        CHANGETABLE(CHANGES Table, @anchor) CT ON PK = PK  
                          
                        WHERE 
                        (  
                        (CT.SYS_CHANGE_OPERATION = 'I' AND CT.SYS_CHANGE_CREATION_VERSION <= @anchor AND CT.SYS_CHANGE_CONTEXT <> @sync_client_id_binary)   
                        OR CT.SYS_CHANGE_CONTEXT IS NULL 
                        )  
                                        ) AS Table 
                    WHERE 
                    DBROWID <= @sync_new_received_anchor  
                    AND 
                    DBROWID > @sync_last_received_anchor  
                END 


    Tuesday, March 10, 2009 9:54 PM
  • Thanks for the reply.
    This looks like a great solution. One question I have regarding the synch plan creation is what (if any) parameters would need to be passed to the sp?
    Also, it looks like you are only looking to Updates (sysobjects.xtype = 'U' ) would I have to incoperate inserts and deletes in here in any way?
    I am pretty new to TSQL, so bear with me.

    Thanks again.
    Wednesday, March 11, 2009 12:43 PM
  • sysobjects.xtype = 'U' is not part of change tracking.  It is a way to query the sysobjects table for just tables and then INNER JOIN on sys.change tracking tables to ensure all tables in the batch are indeed configured for change tracking. 

    This first example only deals with inserts because that is what an uninitialized client has the most trouble with when using SQL Server Change Tracking.  This is because if you have cleanup turned on you will have NO change history available for rows older than 3 days for example.  Therefore, the poor segmentation of using tracking IDs becomes even worse in this scenario... most records will have absolutely nothing to segment on at all.  What this example shows is a way to overcome this by substituting the ROW_NUMBER for change tracking numbers as the segmentation point.

    Wednesday, March 11, 2009 1:30 PM
  • Excellent feedback OO!  We have actually improved batching significantly in the next release to address issues around complexity and OOM  errors.  You will have a chance to put your hands on this new feature in the next CTP of the Sync Framework which is coming soon.  The issue that you specifically talk to around a skewed distribution of DML operations has been eliminated entirely for all common cases.  The one caveat is a scenario where you are synchronizing files stored in your database and one of those files is extremely large.  Again, thanks for your feedback and you willingness to share your ideas with others!

    Sean Kelley
    Program Manager
    Microsoft
    Thursday, March 12, 2009 10:34 PM
    Moderator
  • OverloadedOverrides, I don't realy understand what your doing to implement the batching with integrated change tracking of SQL 2008.

    Could you document just a little more? ore upload a full code example?
    I would be great !! ^^


    Next..
    When does Micrsoft Sync plan to have the functionality of batching (both sides, upload and download)
    We are excited about the sync framework and waiting for more functionality...


    Tnx.
    Thursday, March 26, 2009 2:00 PM
  • We have this all up and running in production currently.  Everything is working great, however I am interested in how many breaking changes we had to make for our custom batching implementation.  Additionally, I was considering writing my own client synch provider to do bulk insert to SQL CE for performance reasons, however I decided not to further deviate from the standard. 

    Another issue which synch framework does not fully address is dynamic row filtering.  By this I refer to filtering table rows based on a related identifier such as a location ID, user ID, or most complicated.. date range.  Again, I was able to solve this issue, however the solution was exceedingly complicated which causes concerns for continued application support.

    When will the new CTP be available?
    Will the CTP contain a fix for client initialization INSERT speed?
    Will the CTP propose a new filtering methodology in addition to an improved batching algorithm?

    Thank you in advance!
    Monday, June 1, 2009 6:05 PM
  • Hi Sean,

    when will this release available. Can´t you give out a pre-release?
    People here are running into trouble with batching several times. And I think it´s not worth to waste any time,
    if it will all be different in the next release.


    Regards,
    Martin
    Tuesday, June 2, 2009 7:40 AM