locked
Parallal synchronization RRS feed

  • Question

  • Is it possible to sync two or many clients with the server in parallal.
    If possible then what is the limit ....

    Thanks
    Archan
    • Moved by Max Wang_1983 Thursday, April 21, 2011 12:40 AM forum consolidation (From:SyncFx - Technical Discussion [ReadOnly])
    Wednesday, July 29, 2009 9:02 AM

Answers

  • Yes, Sync Framework's design imposes no limit  over having one end point involved in multiple syncs in parallel - but this likely would lead to unpredictable results due to concurrency and server (or whatever data store you are using) should be able to support parallel access to the data. In addition, your provider should be able to handle optimistic concurrency i.e. it should be able to tell that version of item it is uploading/downloading is the same as the version it reported earlier in the sync session (via GetChangeBatch and GetVersion). If not provider (whether source or destination) should set a recoverable error and move on.


    But as Michael mentions - parallel syncs may cause tricky problems (like change written by one client is overwritten by another without the second client knowing the change of the first) and many similar issues.

     

    If however the clients are affecting disjoint datasets on the server (say each client updates data for a specific state like NY, WA etc), it may work, provided datasets are truly disjoint (no relation whatsoever).

     

    Sameer

     

    Tuesday, August 18, 2009 11:24 PM
  • Few more questions Archan:

    1. Are all the 150 tables changing all the time? Do some change more often and others are relatively static. If so, you should be able to make some of these reference data as download only. It will improve performance.
    2. What version of Sync framework are you using?
    3. Since you mention SQL Server Enterprise synching with SQL Express client, are you using the out of box providers (from v2 CTP2) or have based it on the sample we had posted.
    4. If you have a high performance machine serving as the server, it will help in your situation. Especially moving to x64 architecture has shown to be effective in our testing.

    We have tested 400 concurrent clients synchronizing to the Server. However the schema and work load are definitely different than yours so these numbers should serve as a guideline. Sync framework does not put a ceiling limit on the number of clients that can synchronize.


    This posting is provided AS IS with no warranties, and confers no rights
    Friday, September 4, 2009 11:10 PM

All replies

  • I doubt that synching using parallal or concurrent clients would work well.  Because the "changes" to be synced is determined on the database or store level (that's singleton) and the watermark is shared.  This suggests that if an item is selected to be synced, it could be selected again by another instance of sync client before the first sync client complete the synchronisation.  This would make parallal syncing pointless if not troublesome.
    Wednesday, August 12, 2009 12:03 AM
  • Yes, Sync Framework's design imposes no limit  over having one end point involved in multiple syncs in parallel - but this likely would lead to unpredictable results due to concurrency and server (or whatever data store you are using) should be able to support parallel access to the data. In addition, your provider should be able to handle optimistic concurrency i.e. it should be able to tell that version of item it is uploading/downloading is the same as the version it reported earlier in the sync session (via GetChangeBatch and GetVersion). If not provider (whether source or destination) should set a recoverable error and move on.


    But as Michael mentions - parallel syncs may cause tricky problems (like change written by one client is overwritten by another without the second client knowing the change of the first) and many similar issues.

     

    If however the clients are affecting disjoint datasets on the server (say each client updates data for a specific state like NY, WA etc), it may work, provided datasets are truly disjoint (no relation whatsoever).

     

    Sameer

     

    Tuesday, August 18, 2009 11:24 PM
  • As Sameer notes, if you are writing providers using Sync Framework, then it largely depends on the datastore you are trying to sync and how the providers are written.
    However ifyou are using the out-of-the-box Database providers that ship with Sync framework and are working with say a back end SQL Server database, then most of the concurrency handling is already taken care of for you. In fact in a true hub-spoke scenario involving a SQL Server hub and other SQL  server or SQL Compact clients, we have tested successfully a good number of clients synchronizing with the server concurrently.
    This posting is provided AS IS with no warranties, and confers no rights
    Sunday, August 23, 2009 4:43 PM
  • I've tested the parallel execution scenarion with mutliple sync clients running concurrently against the same sync service for the same sync scope.  I'm using out-of-box SqlSyncProvider.

    I had "deadlock" exception happening on the server side at the ApplyChanges stage.  But at the end of the test, two database are in sync.

    My view on this subject is this, based on the testing result:

    • Sync Framework does NOT work well in parellel.  One change from the source could be picked up by multiple sync client and it could be applied concurrently on the server through different sync sessions.  This is my explanation how the deadlock error occurred.
    • However, the next synchronization cycle would make two dbs in sync again after deadlock did occur on the server side.

    One may not care about the server side deadlock error with an assumption that there will be another sync cycle happening soon and it will make two dbs in sycn again.  This may well be the case.  But this is not the same to say that SqlSyncProvider natively supports synchronization in parrallel.

    Any comment, Mahesh?

    Tuesday, August 25, 2009 2:26 AM
  • Hi Michael,

    Concurrent synchronizations against sql server using out of the box providers (sqlsync provider, server provider) is absolutely possible and very common in almost all of our deployments. We do test 100s of concurrent syncs in our performance and scale test labs. Having said that it is not possible to guarantee no deadlocks since a lot depends on the schema of the database (especially indexes on tables)and the data change patterns. We typically run on representative schemas and workloads that we get from a few customers.

    The concurrency characteristics are more to do with the provisioning of the commands than with the provider itself. It is tough to write a query that will work on all kinds of schemas. There is usually a lot of schema specific fine tuning that goes on for sql queries in applications to make the QP gods happy. Specifically for this reason Sync Framework has kept the sql queries out side product code so they can be modified by developers for their specific schema. 

    I am interested to know a bit more about your scenario, schema, workload and sync frequencies. We could also look at your apply queries to see if there is room for improvement in your case.

    Thanks for posting.
    Sudarshan



    Development Lead , Microsoft
    Thursday, August 27, 2009 5:20 PM
    Moderator
  • Thanks for your reply, Sudarshan.  It's good to know that running sync concurrently is intended by the Sync Framework team. 

    However, two issues make this option less attractive to me:

    1. Deadlock causing the current sync cycle fail therefore two dbs are not synced in the end of the current sync cycle.
    2. The retional behind running sync in parallel doesn't seem to be clear to me.  If the intention is for load balancing, the "SelectChanges" process is certainly not achieving this goal because there is not only the ability but going to be extremely hard to have the dynamic partitioning over the sync data.  If the intention is to have any data change triggering a sync process (this is what we wanted), then there certainly no mechanism currently in place to associate that data with a sync session triggered by it.   Synchrnozation is fundamentially a batching process.  This presents the question, why would one run sync in parallel?  With the fact that data could be synced through multiple sync sessions concurrently with the potential of deadlock happening, concurrency appears to be a pointless exercise.


    Now, in terms of our sync scenario, it's a straightforward synchronization of one datatable between two SQL2005 databases over the Internet.
     

    • PK of the table is an numeric field.  There are 5 data columns for the table.  Two of the 5 forms an unique constraint.
    • The ApplyInsert SP on the destination tests if the row exists before inserting a new row.
    • The ApplyChangeFailed applies RetryWithForceWrite when LocalInsertRemoteInsert occurs.
    • The sync client inserts one record into the source datatable before kicking off a synchronisation.  This is repeated for 50 times.
    • Three instances fo the sync client are running concurrently.  And this is when the Deadlock could be easily produced at the time when the destination is applying the changes.






     

    Monday, August 31, 2009 12:08 AM
  • Sorry for the late reply.

    Michael, Sudarshan gave some good points above on the concurrency.

    By running concurrent syncs, I am assuming that you mean:
    1 Server, n number of clients - multiple sync sessions running for these clients (specifically 1 sync session each between the Server and 1 client)

    Now is it the same row (PK/unique constraint) is inserted in your 3 clients above and those rows are then going to be tried (and failed at the Server)?
    Typically this should not be the case. In case there are going to be such errors/conflicts, then it is recommened that you use some sort of identity management mechanism that can alleviate this problem. If you do have such DML pattern, then yes, there could be deadlocks and also DML errors and sync failures. Added to that if you have more complex PK FK relationships between the tables, it could further complicate matters.

    Typically the Server harware being more copable than the clients, it can handle cocurrent DML requests from the clients and when the app can take care of identity conflicts, parallel synchronization can help the overall data availability.
    This posting is provided AS IS with no warranties, and confers no rights
    Monday, August 31, 2009 7:36 AM

  • Thank you guys for all the valuable information.

    Actually in my situation..There is one Central Server and there are more that 100 clients and each client work on some specific set of data..there is no overlapping of data for the clients.The number of tables in Sync will be around 150.
    I am using SQL Server Enterprise at the Server side and SQL Express 2008 at the client side.

    The data is mainly updated at the client sides and also there is some logic runing at the server side which can modify the data at the server...

    In this scenarion I want to run concurrent sync from the clients....
    Please let me know if in this scnerio the parallal sync will work fine as individual sync for the clients will take much more time and the number of clients will increase with time...so giving individual time slot will be difficult

    What is the max limit of concurrent client can sync at a time..any idea..

    Tuesday, September 1, 2009 6:58 AM
  • Few more questions Archan:

    1. Are all the 150 tables changing all the time? Do some change more often and others are relatively static. If so, you should be able to make some of these reference data as download only. It will improve performance.
    2. What version of Sync framework are you using?
    3. Since you mention SQL Server Enterprise synching with SQL Express client, are you using the out of box providers (from v2 CTP2) or have based it on the sample we had posted.
    4. If you have a high performance machine serving as the server, it will help in your situation. Especially moving to x64 architecture has shown to be effective in our testing.

    We have tested 400 concurrent clients synchronizing to the Server. However the schema and work load are definitely different than yours so these numbers should serve as a guideline. Sync framework does not put a ceiling limit on the number of clients that can synchronize.


    This posting is provided AS IS with no warranties, and confers no rights
    Friday, September 4, 2009 11:10 PM
  • Hi Mahesh,

    Thank you for the response..

                  1. 150 tables will change often but some of them will be changed less frequently. There are other tables  which     change only in the Server side, so they are Downloadonly.
                 2. I am Using V2 CTP2.
                 3. My work is based on the sample provided for SQL Express.
                 4. At the moment we are in 32 bit environment..we may move later on..

    I am also trying to have N-Tier architecture for this purpose and facing problems for SQL Express. Can you please help me in this regard also.

    Thanks
    Archan
    Monday, September 7, 2009 12:11 PM
  • Archan, we can thenclose this thread. Please start a seaparate thread for the N-tier issue so that we can track it better.
    This posting is provided AS IS with no warranties, and confers no rights
    Monday, September 7, 2009 4:50 PM
  • Ya sure..We can close the thread...

    Thanks
    Archan
    Friday, September 11, 2009 9:58 AM