none
SyncFx 1.0 for Devices, row_number() instead of Batching with TimeStamps? RRS feed

  • Question

  • Hi,

    I have an app that uses SyncFx 1.0 on a WM6 device, using GZip over SSL, against a WCF web service, to a SQL 2k5 DB.

    I had originally used DateTime as the tracking fields for sync diffs and then started getting OutOfMemoryExceptions when GZip tried to decompress to a memoryStream for the initial sync with large amounts of data, so I implemented batching..

    With the batching, I was able to (finally) get it working with the DateTime fields, but then realized that when I added the DateTime columns to a pre-existing table, the CreationDate (and/or LastEditDate) columns were all the same and thus even with batching it was still not able to 'chunk' the data with the SelectIncrementalUpdatesCommand -- so I ripped all that out and used TimeStamp columns (UpdateTimestamp, InsertTimestamp) and ClientID tracking columns (UpdateID, InsertID).

    I found that when those columns were added to an existing table:

    - the scripts in MSDN to add them had columns allowing Null, so the @@dbts was not filling and the InsertTimestamp column was always Null -- So I ended up running a separate script to SET InsertTimestamp=UpdateTimestamp WHERE InsertTimestamp IS NULL, which kind-of worked, but then changed them to NOT Null in the scripts to add the columns and that worked better..

    - the timestamp calculation for usp_GetNewBatchAnchor, to get LastReceivedAnchor on initial sync (and causes issues with subsequent syncs as well), had to take the min(theTimestampCol) from all my synched tables to find the very first one (i.e. to not start at zero and have a million empty synch batches before it got anywhere), but I still find that implementation to be problematic because I'm doing batches of 400 'rows', which doesn't seem to work because it'll calculate out, for instance, that 27 batches are needed to go from SyncLastReceivedAnchor to SyncMaxReceivedAnchor (i.e. @@dbts, or min_active_rowversion(), doesn't seem to matter), and then it'll just use the same values as the 'from' and 'to' for grabbing records for each table's SelectIncrementalUpdatesCommand and SelectIncrementalnsertsCommand, which is coming up with a ton of empty batches..

    So, I found this post (http://social.microsoft.com/Forums/en/uklaunch2007ado.net/thread/4fd6df1a-52bb-47e9-9ff1-6427bf577b92), and it seems that row_number() can be used somehow, and it kind-of makes sense to me, but the sample code in there is for SQL2k8 with ChangeTracking and I'm not exactly sure how to implement it under my current SQL 2k5 architecture..

    Also, I would have just replied to that post for assistance but it seems to be read-only, and I can't find any way to contact the guy who seems to have a clue -- http://social.microsoft.com/profile/en-US?user=OverloadedOverrides

    Can anyone help me out with this?  

     

     

    Regards, 

    Charles R. Tomlinson | Senior Consultant/Developer | Alocet Inc.

     

     

     

    Tuesday, April 6, 2010 8:11 PM

Answers

  • hi chuck,

    I'm not sure the SSCE35SP2 Books Online has hit the web. I'm still getting errors trying to access it. But the, the Change Tracking can only give you what has changed. So you still have to figure out how to handle the sync knowledge of what was sent and received by which peer.

    Here's a good reading on understanding the Sync Knowledge: http://msdn.microsoft.com/en-us/library/bb902834(v=SQL.100).aspx

    I will look further into your other issue on the batching when i get some free time over the weekend. But you may want to try also the other approach I mentioned of breaking your sync into smaller sync groups. (e.g., instantiate syncagent, sync group 1 with 5 tables, synchronize. instantiate another syncagent, sync group 2 with 5 tables, and so on).

    v2 has memory based batching but unfortunately, device support is still on V1.

    V3 I believe will have better support story for devices (have a look at the sync protocol in the MIX 2010 sample sync app).

    Thursday, April 15, 2010 11:07 AM
    Moderator

All replies

  • you can try the approach here for reducing the empty batches... http://social.microsoft.com/Forums/en-US/syncdevdiscussions/thread/08e6f9dd-bc48-4742-9cad-aea0290745b2/

     the approach checks if the new anchor retrieves rows, if it does, it passes that as an anchor, if it doesnt, it retrieves a new anchor.

    Wednesday, April 7, 2010 9:19 AM
    Moderator
  • Thanks, but that's similar to the approach in my posted link whereas the person was also using SQL 2008 w. change tracking - I'm using SQL 2k5.
    Wednesday, April 7, 2010 1:18 PM
  • you just have to change the query on the change tracking table to query your table  directly and get the row count. the query replacement will be similar to a SelectIncremental command, except that you dont query the columns, just the count.

    e.g

    SELECT @rows=count(*)
     FROM dbo.mytable
    WHERE [timestampcolumn] > @sync_last_received_anchor OR [timestampcolumn] <= @temp_anchor

    Wednesday, April 7, 2010 1:22 PM
    Moderator
  • ok, but what if I have 20 tables that are being synched? 

    If I don't pass @sync_last_received_anchor then it already checks min(UpdateTimestamp) and min(InsertTimestamp) on all tables to figure out the lowest timestamp for Initial Sync, and then figures out how many batches between that and MIN_ACTIVE_ROWVERSION()-1 [i.e. the maxTimestamp].  So then for each batch sync hit it now tries between the minTimestamp and minTimestamp+400 (batch size) on each table, where for the most part there may not be any records...  What I'd like to do is to have it take the, for instance, 30 records from Table1, 250 from Table2, and the remaining 120 out of Table3, then continue in the next batch with the remainder from Table3, and so on, until maxTimestamp...

    Right now, in an initial sync it's taking 30m to pull 2940 records, in 41 batches [my calculations are 2940/400 = 7.35 batches?!].. which is just way too long... (and yes I have that performance HotFix in)

    Thoughts?

    Wednesday, April 7, 2010 5:25 PM
  • you would do the row count check in the same way you get the min time stamp from all the tables.

    having gaps, fragmented and empty batches are inherent in anchor-based synchronization (compared to the collaboration providers knowledge-based), so you can't really totally get rid of it, only minimize.

    have you considered breaking up your tables into multiple sync groups in different sync sessions?

    with regards to "fragmentation" in the timestamps caused by frequent updates/deletes especially in the server, i find it useful when starting a sync project where there's no sync client yet, to do dummy updates on all rows (update tablex set colx=colx) or copying the contents of one table to temp table, dropping the original table and renaming the temp table. This at least sets the create timestamps/update timestamps to be "contiguous" at the start.

     

    Thursday, April 8, 2010 10:19 AM
    Moderator
  • Hey June,

    My tables are in SyncGroups - I have 2 - one for the BiD tables and one for DLonly.. What do you mean about different sync sessions?

    I'm not opposed to ripping everything apart to see if I can get 'knowledge-based' sync working.. Can you elaborate as to how I might implement that functionality from a WM6 device w. SSCE to SQL 2k5 via WCF?  (I presume WCF is still required in that case?)

    When I add the columns, UpdateTimestamp is a timestamp and InsertTimestamp is a binary(8) DEFAULT @@DBTS+1
    It adds the InsertTimestamp column as NULL, and I then UPDATE @tableName SET InsertTimestamp=UpdateTimestamp WHERE InsertTimestamp Is Null -- but that increments the UpdateTimestamp, so they're never the same and always off by (approximately) the number of records in the table.. And seemingly I cannot set my default on InsertTimestamp to [UpdateTimestamp] becase 'column names are not allowed'.  I've also tried setting the InsertTimestamp to NOT NULL, which sticks the default in as it should but then it's all the same value..  I've even tried ALTER TABLE @tableName ALTER COLUMN UpdateTimestamp binary(8) before my update to turn off the 'timestamping' but alas I "Cannot alter column 'UpdateTimestamp' because it is 'timestamp'."

     Edit: -- Also, I'm trying to do the row counting thing but I need a @UserName (custom SyncParm) and @sync_client_id, but cannot seem to get at the SyncSession parms from within MySyncProvider.InitializeNewAnchorCommand() in my WCF layer?  Do I have to do something in the SyncAgent.InitializeSyncTables() like I did for adding the custom SyncParm for @UserName?

     

     

     

     

    Friday, April 9, 2010 3:46 PM
  • By different sync sessions i mean instantiating different sync agents for each group of tables.

    To get the knowledge-based sync behaviour similar to the collaboration providers, you may have to write a custom provider.

    With regards to your InsertTimestamp not working, is it failing on the SQL 2k5 DB?

    how did you add your custom parameters? If you had added the parameters in the SQL statements, you should be able to assign the value in the SyncAgent's configuration (Configuration.SyncParameters.Add).

    Is your sync code generated from the Designer (Local Database Cache) or manually created?

     

     

     

    Saturday, April 10, 2010 12:15 AM
    Moderator
  • Hi June,

    First, thanks for your help thus far... I only just found your blog the other day! You need to stick it in the signature of your messages =)

    So, I've now got a couple of tables 'working' where it keeps on incrementing my @temp_anchor until it has rows and then dumps it out as the @new_anchor at the end, BUT my main issue re. Out of Memory (from before I tried batching as a solution) has come back..  I see why though -- it calculates out the highest timestamp with rows, which on the first @sync_table_name into the anchor gets all the way to @@dbts (to find a later timestamp that returns rows) because the table happens to be empty... So then when it gets to my BIG table (fails when the GZipped dataset decompresses) as it did before because it's taking the whole thing..  One thing I did notice though is that when I pass @sync_table_name I'm seeing from SQL Profiler that it's running ALL the SelectIncrementalInsert/Update/Deletes for all tables between start/end timestamp? Is it supposed to do that?  So even if I do the row counting, if it's doing all tables, then there's still the possibility that I can get the OOM.  It would be nice if I could batch based on ACTUAL memory instead of row count because there could be some rows with less data and some with a ton of data in the columns -- my batch size is 400 rows currently simply because that was the largest I could get in my sample data file that wouldn't OOM on me.. I'm almost ready to try your Byte Array thing but would love to get this working since I feel so close to a solution now....

    Last, I should prolly split this into another question but figured I'd ask anyway.. We were talking about knowledge-based sync versus the limitations with anchor-based and you mentioned a CustomProvider -- With SSCE35SP2 being released, is there any documentation about where I might even begin to start using the new client-side change tracking API?  I've spent a LOT in working through the anhor-based sync but keep on hitting blocks after I get over piece after piece.. I'd like to maybe spend a day offline from anchor-based sync to drill through a raw knowledge-based sync and at least get an idea of the scope of work..

    thx.
    CRT.

    Wednesday, April 14, 2010 2:23 PM
  • hi chuck,

    I'm not sure the SSCE35SP2 Books Online has hit the web. I'm still getting errors trying to access it. But the, the Change Tracking can only give you what has changed. So you still have to figure out how to handle the sync knowledge of what was sent and received by which peer.

    Here's a good reading on understanding the Sync Knowledge: http://msdn.microsoft.com/en-us/library/bb902834(v=SQL.100).aspx

    I will look further into your other issue on the batching when i get some free time over the weekend. But you may want to try also the other approach I mentioned of breaking your sync into smaller sync groups. (e.g., instantiate syncagent, sync group 1 with 5 tables, synchronize. instantiate another syncagent, sync group 2 with 5 tables, and so on).

    v2 has memory based batching but unfortunately, device support is still on V1.

    V3 I believe will have better support story for devices (have a look at the sync protocol in the MIX 2010 sample sync app).

    Thursday, April 15, 2010 11:07 AM
    Moderator