locked
Storing Knowledge in a real world scenario? RRS feed

  • Question

  • Hi,

    I've run across some issues as to where I should store my Knowledge and would like to hear any advice anyone could give.

    The scenario (simplified) is the following:

    The client uses custom objects and a custom store. I've implemented the client sync provider already using a SqlCeMetadataStore for the metadata. That end is fairly simple, the question occurs at the other end.

    The Server Provider will be calling a WCF Service which in turn has access to the database. Now the metadata in the database is stored inside the tables (extra columns) similar to the DbServerSyncProvider. I can't user that provider because the other end is not a database provider. The server provider can get the data and metadata from my business objects (loaded from the database). But what do I do with my Knowledge and ForgottenKnowledge objects after the sync ends? Where does the DbServerSyncProvider save those objects or does it generate it from scratch from the tables in each sync session? Any insight would help.

    Can I throw those objects away after each session? What drawbacks does that have?

    Am I going down the wrong path? Should I be implementing a SyncProvider not a KnowledgeSyncProvider? Are there any examples on doing that?

    I could let the WCF service use a SqlCeMetadataStore aswell, but that would create a bunch of new problems in my multi-user scenario.

    Thanks in advance,
    Alex



    • Moved by Max Wang_1983 Thursday, April 21, 2011 10:14 PM forum consolidation (From:SyncFx - Technical Discussion [ReadOnly])
    Tuesday, February 19, 2008 8:30 PM

Answers

  • Hi Alex,

     

    Sean asked me to follow-up with you on your question.

     

    Knowledge is a compact representation of the changes (i.e. change versions) a particular endpoint knows about.

     

    Knowledge is used by the framework in change enumeration and in change application/conflict detection.

     

    In change enumeration, the sender uses the destination's knowledge to determine what changes to send. Any change the destination doesn't know about (i.e. version is not contained in the destination's knowledge) is sent.

     

    In change application, the destination uses the sender's knowledge to detect conflicts. A conflict exists if one endpoint makes a change to an item while some other change is made "concurrently" (i.e. between syncs) on another endpoint. In other words, a conflict exists if one endpoint makes a change without knowing about some other change. So, we can ask, is the destination version each item changed on the sender (i.e. for each item sent) contained in the source knowledge. If it isn't, then we have detected a conflict.

     

    From this we can see why it would be a bad idea to "throw away" knowledge between syncs: we would think each time that the sender doesn't know about various changes that it does indeed know about. This means false conflicts. And, especially if you are using automatic conflict resolution, you may resolve this false conflict in favour of the stale data, thus implying potential data loss. Additionally, there are performance implications -- if the destination knowledge is new (i.e. empty) each time, then the source will send everything since the destination knowledge will imply the destination doesn't know any versions.

     

    So, throwing away knowledge at the end of each sync session isn't a viable strategy.

     

    However, it is not necessary that knowledge is stored in the file system. For example, you could store it in a table in your database. In fact, this is ideal; otherwise you have failure cases under which the knowledge could become inconsistent with the data (which as noted above, is less than ideal.) Then, you can SELECT FOR UPDATE knowledge at the beginning of your change application transaction and save the update knowledge at the end.

     

    You also have some freedom in how your bound your transactions on the server.

     

    In change enumeration, you need to enumerate in a snapshot; otherwise you need to hold a table-level lock or do special processing. The concern is that otherwise you could enumerate changes that aren't contained in the source knowledge (since it isn't yet updated).

     

    In change application, you can either bound the transaction across the entire session or across the application of a batch. In the latter case you would need to do your own checks to validate that knowledge or versions haven't changed underneath you during the change application. (See SaveChangeContext.DestinationVersionSuppliedForChange.) If they have, you would want to skip the change with the changed version (via SaveChangeContext.RecordRecoverableError...())

     

    Regarding your question of "could I bypass the check-out check-in described above by just combining each concurrent sync session with the master [knowledge] object?" The answer is: not without a lot of difficulty. The problem is that each session needs to be working with the current knowledge exactly matching the data in the store. Otherwise, you could end-up with false conflicts (and potential data loss), failure to detect conflicts, and potentially incorrect metadata on the clients.

     

    I would suggest using the database to manage concurrent access to knowledge. Store knowledge in the database. Then choose your transaction boundaries depending where you want to fall on the throughput/concurrency vs. code complexity trade-off.

     

    Hope that helps,

    Neil

     

     

    Wednesday, February 20, 2008 11:16 PM

All replies

  • Ok, I now know that the ServerSyncProviders (in Microsoft.Synchronisation.Data) are internally sync'ed inside the SyncAgent in a different way than KnowledgeSyncProviders. There doesn't seem to be any knowledge involved.

    But there why can the ADO.net services sync without knowledge if it is based on the same framework?

    Can we store the SyncKnowledge objects in a database somehow or is serializing to a binary format the only supported method at the moment?

    Any help appreciated!
    Thanks,
    Alex
    Tuesday, February 19, 2008 8:52 PM
  • Alex,

     

    For your scenario I would definitely recommend implementing KnowledgeSyncProvider for both end-points or leveraging the PeerSyncProvider which does use knowledge and is specific to syncing relational databases.  ServerSyncProvider was implemented prior to integrating Sync Services for ADO.NET into the Microsoft Sync Framework and therefore does not leverage knowledge to keep track of changes. 

     

    Thanks,

     

    Sean Kelley

     

    Wednesday, February 20, 2008 5:14 PM
    Moderator
  • Hi,

    great Smile at least I'm on track.

    But the question remains is it really the correct approach to do the following:
    My WCF service is used by multiple users.
    The knowledge object is globally applicable as far as I know.

    So I need to store the knowledge in a singleton manner on the server side, let each wcf service access it but ensure mutual access for each sync session. (Check-out before sync session, Check-in after sync session.)
    I need to then ensure that the singleton service serializes to the file system on each check-in in case the system goes down.

    All in all this solution just does not sound clean at all. Is this really the recommended path for such a scenario?

    I know you can combine two knowledge objects with the Combine method, so could I bypass the check-out check-in described above by just combining each concurrent sync session with the master object?

    Any other approaches?

    Thanks,
    Alex
    Wednesday, February 20, 2008 8:29 PM
  • Hi Alex,

     

    Sean asked me to follow-up with you on your question.

     

    Knowledge is a compact representation of the changes (i.e. change versions) a particular endpoint knows about.

     

    Knowledge is used by the framework in change enumeration and in change application/conflict detection.

     

    In change enumeration, the sender uses the destination's knowledge to determine what changes to send. Any change the destination doesn't know about (i.e. version is not contained in the destination's knowledge) is sent.

     

    In change application, the destination uses the sender's knowledge to detect conflicts. A conflict exists if one endpoint makes a change to an item while some other change is made "concurrently" (i.e. between syncs) on another endpoint. In other words, a conflict exists if one endpoint makes a change without knowing about some other change. So, we can ask, is the destination version each item changed on the sender (i.e. for each item sent) contained in the source knowledge. If it isn't, then we have detected a conflict.

     

    From this we can see why it would be a bad idea to "throw away" knowledge between syncs: we would think each time that the sender doesn't know about various changes that it does indeed know about. This means false conflicts. And, especially if you are using automatic conflict resolution, you may resolve this false conflict in favour of the stale data, thus implying potential data loss. Additionally, there are performance implications -- if the destination knowledge is new (i.e. empty) each time, then the source will send everything since the destination knowledge will imply the destination doesn't know any versions.

     

    So, throwing away knowledge at the end of each sync session isn't a viable strategy.

     

    However, it is not necessary that knowledge is stored in the file system. For example, you could store it in a table in your database. In fact, this is ideal; otherwise you have failure cases under which the knowledge could become inconsistent with the data (which as noted above, is less than ideal.) Then, you can SELECT FOR UPDATE knowledge at the beginning of your change application transaction and save the update knowledge at the end.

     

    You also have some freedom in how your bound your transactions on the server.

     

    In change enumeration, you need to enumerate in a snapshot; otherwise you need to hold a table-level lock or do special processing. The concern is that otherwise you could enumerate changes that aren't contained in the source knowledge (since it isn't yet updated).

     

    In change application, you can either bound the transaction across the entire session or across the application of a batch. In the latter case you would need to do your own checks to validate that knowledge or versions haven't changed underneath you during the change application. (See SaveChangeContext.DestinationVersionSuppliedForChange.) If they have, you would want to skip the change with the changed version (via SaveChangeContext.RecordRecoverableError...())

     

    Regarding your question of "could I bypass the check-out check-in described above by just combining each concurrent sync session with the master [knowledge] object?" The answer is: not without a lot of difficulty. The problem is that each session needs to be working with the current knowledge exactly matching the data in the store. Otherwise, you could end-up with false conflicts (and potential data loss), failure to detect conflicts, and potentially incorrect metadata on the clients.

     

    I would suggest using the database to manage concurrent access to knowledge. Store knowledge in the database. Then choose your transaction boundaries depending where you want to fall on the throughput/concurrency vs. code complexity trade-off.

     

    Hope that helps,

    Neil

     

     

    Wednesday, February 20, 2008 11:16 PM