locked
Synchronizing large data stores RRS feed

  • Question

  • As I understand each time a synchronization session is executed all the knowledge about all the entities in the store must be exchanged between the providers. When synchronizing large data stores this can be a problem.

    If the stores are big and holds large numbers of entities the knowledge can be quite big. Transferring all the knowledge between the providers can create a performance problem.

    Is there a recommendation about the max storage size to control the size of the knowledge transferred?

    It would be nice if the knowledge itself could have been synchronized so only the deltas of the knowledge would have been sent between the providers instead of the whole knowledge.

     

    manu
    • Moved by Max Wang_1983 Friday, April 22, 2011 5:12 PM forum consolidation (From:SyncFx - Microsoft Sync Framework Database Providers [ReadOnly])
    Friday, September 5, 2008 4:12 PM

Answers

  • Knowledge represents information for all items in that store and usually doesnt hold knowledge for individual items. Atleast thats how SyncServices handles it. You group tables you want to sync in a logical Scope and the knowledge for that scope represents all tables+rows in that scope. So it shouldnt matter if your scope has 100 tables or 200,000 rows. Now the knowledge size will grow with either of the two cases.

     

    1. Your store has "downloaded" changes from many peers. In this instance your replica key map will grow and so will your entire knowledge size.

    2. You knowledge is fragmented. By fragmentation I mean your knowledge has a lot of individual item level exceptions or has many range exceptions.

     

    When you combine 2 with 1 (i.e you sync with peers who have huge fragmented knowledge) then the knowledge size will grow. By how much is kind of based on situation.

     

    A plain vanilla SyncServices maintained knowledge will have the following structures in additon to more stuff(note this is not exact and is specific to the idformat that SyncServices uses).

     

    SyncKnowledge

    Replica Key Id - 4 bytes

    Repliac Key Guid - 16 Bytes

    ItemId -Only if you have individual item level exceptions and this is variable.

    Tickcount: 8 bytes

    There are some extra pointers in the native knowledge structure which could take up size.

     

    The actual serialized size for a DbSyncProvider knowledge which has not sync'd with any peer (with ulong.Max tickcount is) 85 bytes.

     

    When you add one peer with same ulong.Max tickcount the size increases to 113 bytes and to 141 for 3 peers. So basically it seems it increases by 28 bytes (4 +16 + 8) for each new peer.

     

    As I said if you start adding individual item execptions your knowledge will ballon. You should always consider adding range exceptions as they are manageable and the knowledge is smart enough to collapse two ranges if they fall right next to each other in ids.

     

    Hope this answers your question. Curious as to how you are using/planning on using MSF. Are you using SyncServices or other sample providers?

     

     

     

    Friday, September 5, 2008 7:22 PM
    Moderator
  •  

    Knowledge can contain information for a range of item ids (i.e for all items with ids between 1 and 1000, my knowledge is FOO) or can contain information for individual items (ex. for item id 1 I know upto T, for item id 2 I know up to T3 etc). Sync services maintains a range that spans all tables in the scope. hence the knowldge usually looks like Db1:1000 (for peer with id DB1, I have sent all my local changes upto local timestamp 1000). There can be conflicts when the destination applies changes and one option for resolution is Skip_and_retry_next_sync. If this option is selected then the destination will add "item exception" for the row that is skipped. For instance in the above example lets say DB1 is applying delta changes from source. its starting knowledge of source was 1000. Now source has sent changes till 2000 but for Item id Id1 there was a conflict and destination choose to skip that row. So now after the changes are applies the new knowledge on destination will be Db1:2000 (ID1:1000). This denotes that for Id1 it knows changes up to timestamp 1000 and for *all* other ids it knows up to 2000.

     

    In Sync Services, each row in a table will maintain information on when was the last time it was updated. The knowledge tells the source the timestamp that it has sent to the destination. Any row whose update/create timestamp falls beyond the destiantaion timestamp is the delta.

     

    Let me know if you need any further clarifications.

    Monday, September 8, 2008 5:55 PM
    Moderator

All replies

  • Knowledge represents information for all items in that store and usually doesnt hold knowledge for individual items. Atleast thats how SyncServices handles it. You group tables you want to sync in a logical Scope and the knowledge for that scope represents all tables+rows in that scope. So it shouldnt matter if your scope has 100 tables or 200,000 rows. Now the knowledge size will grow with either of the two cases.

     

    1. Your store has "downloaded" changes from many peers. In this instance your replica key map will grow and so will your entire knowledge size.

    2. You knowledge is fragmented. By fragmentation I mean your knowledge has a lot of individual item level exceptions or has many range exceptions.

     

    When you combine 2 with 1 (i.e you sync with peers who have huge fragmented knowledge) then the knowledge size will grow. By how much is kind of based on situation.

     

    A plain vanilla SyncServices maintained knowledge will have the following structures in additon to more stuff(note this is not exact and is specific to the idformat that SyncServices uses).

     

    SyncKnowledge

    Replica Key Id - 4 bytes

    Repliac Key Guid - 16 Bytes

    ItemId -Only if you have individual item level exceptions and this is variable.

    Tickcount: 8 bytes

    There are some extra pointers in the native knowledge structure which could take up size.

     

    The actual serialized size for a DbSyncProvider knowledge which has not sync'd with any peer (with ulong.Max tickcount is) 85 bytes.

     

    When you add one peer with same ulong.Max tickcount the size increases to 113 bytes and to 141 for 3 peers. So basically it seems it increases by 28 bytes (4 +16 + 8) for each new peer.

     

    As I said if you start adding individual item execptions your knowledge will ballon. You should always consider adding range exceptions as they are manageable and the knowledge is smart enough to collapse two ranges if they fall right next to each other in ids.

     

    Hope this answers your question. Curious as to how you are using/planning on using MSF. Are you using SyncServices or other sample providers?

     

     

     

    Friday, September 5, 2008 7:22 PM
    Moderator
  • Hi

     

    Sorry but I do not understand: What is "item level exceptions" or "range exceptions"

     

    I read the MSF documentation where it was written that knowledge contain information about each entity (row in DB). This way it is possible understand exactly what had changed and to send only the deltas (the entities that actually changed) when the final synchronization is performed.

     

    If the knowledge is per "scope" does this mean that if one row in the scope changed the whole scope is sent when the actual data is synchronized?

     

    To answer your last question:

    MSF looks amazing !!!

    We a looking into developing custom MSF synchronization providers to synchronize data between services. The services are subscribed to a Pub-Sub infra so they know that something changed and then they trigger the synchronization.

     

    Another problem that we face is the fact that each service is allowed to see only part of the data. These permissions are dynamic so we want to synchronize data between the services but we do not want services to get data they should not see.

    For example:

    Service A is allowed to see European customers and Service B is allowed to see American customers and both are synchronizing against one big customers repository (I cannot divide this repository because the permissions are dynamic)

    Service A will not present knowledge about American customers (A does not know about them as it is not allowed to see them) now the repository will think that these customers are missing in Service A view of the data and send these customers to A when it is not allowed to see them.  

     

    Any ideas?

     

    Thanks

     

    manu

     

    Friday, September 5, 2008 9:10 PM
  •  

    Knowledge can contain information for a range of item ids (i.e for all items with ids between 1 and 1000, my knowledge is FOO) or can contain information for individual items (ex. for item id 1 I know upto T, for item id 2 I know up to T3 etc). Sync services maintains a range that spans all tables in the scope. hence the knowldge usually looks like Db1:1000 (for peer with id DB1, I have sent all my local changes upto local timestamp 1000). There can be conflicts when the destination applies changes and one option for resolution is Skip_and_retry_next_sync. If this option is selected then the destination will add "item exception" for the row that is skipped. For instance in the above example lets say DB1 is applying delta changes from source. its starting knowledge of source was 1000. Now source has sent changes till 2000 but for Item id Id1 there was a conflict and destination choose to skip that row. So now after the changes are applies the new knowledge on destination will be Db1:2000 (ID1:1000). This denotes that for Id1 it knows changes up to timestamp 1000 and for *all* other ids it knows up to 2000.

     

    In Sync Services, each row in a table will maintain information on when was the last time it was updated. The knowledge tells the source the timestamp that it has sent to the destination. Any row whose update/create timestamp falls beyond the destiantaion timestamp is the delta.

     

    Let me know if you need any further clarifications.

    Monday, September 8, 2008 5:55 PM
    Moderator