none
Infinite Loop in SqlSyncStoreRestore.PerformPostRestoreFixup RRS feed

  • General discussion

  • Hi,

    Like others, I have encountered the (dreaded) infinite loop problem in SqlSyncStoreRestore.PerformPostRestoreFixup(), which is meant to be called after restoring a SQL database.  This is an acknowledged bug (see http://social.msdn.microsoft.com/Forums/en-US/uklaunch2007ado.net/thread/bc7cbe7a-ae31-43cc-85f8-4ff6beb3e5aa) but with no forthcoming resolution.  For any production usage, the ability to restore a database is pretty important.  I would like to share my research in to this problem, including how it can possibly be fixed.  I very much welcome any and all feedback.  While I can easily follow the code, and definitely know how the infinite loop occurs, my understanding of what the code is doing is less than perfect.

    The problem can only happen to those of us with more than one scope defined.  I suspect there may be other contributing parts that cause the loop to occur. 

    PerformPostRestoreFixup does a number of things, but the last thing it does before committing changes to the database is to make a call to UpdateScopeKnowledges.  In fact it calls UpdateScopeKnowledges once for every defined scope in the database.  The general pattern of UpdateScopeKnowledges is:

    • Read the scope
    • Perform fixup on the knowledge in the scope
    • Write the scope

    The infinite loop actually takes place in writing the scope, in a method called SqlSyncScopeHandler.WriteScopeWithRetry, but that isn't where I think the actual bug is.  WriteScopeWithRetry attempts to write updates to the scope, and checks if any rows were updated.  If no updates were made, it makes the assumption that a database problem such as a concurrency failure occurred.  This causes it to read the scope again and merge changes, and then attempt to write it again.  It will do this until the update 'succeeds' (until rows are updated).  Clearly the intent is that a row should be updated if we get to this method.

    ReadScope has some interesting code in it.  You'll note that early on I said this failure won't occur if you have only one scope, and the OP has only 2 scopes.  ReadScope reads the timestamp from the scope first ('scope_timestamp').  It then compares that timestamp to a property called ScopeTimestamp.  The first time through, ScopeTimestamp is the initial value of zero.  If the just read timestamp is greater than ScopeTimestamp, then and only then are the other scope fields read!  In addition to reading those fields, ScopeTimestamp is set to the value of that scope's timestamp.

    So first time through this works fine, the scope is read, updated, and written out just fine.  The next time through, ScopeTimestamp is STILL the value from the last scope!  So it is not so likely to be less than the timestamp of the scope now being read.  Thus, no other scope fields are read!  So when we get to WriteScope, it attempts to update the scope using old field values from the first scope.  This is especially troublesome since the where clause on the update includes the scope's timestamp.  Where does the update get that value?  Not from the read in timestamp, but from a ScopeTimestamp property, which is ONLY set when ScopeTimestamp is < the read in timestamp.

    So that's how all this happens.  Essentially after the first scope it seems unlikely that future scopes will be read from the database, hence there is a real problem.

    The proposed solution?  I'm not sure which is the best way to go.  Option 1, which I have tested, is to modify UpdateScopeKnowledges, adding a single line to the very start of it that sets the scope handler's ScopeTimestamp back to zero for each scope being processed, forcing every scope to be processed like the first scope.

    A second option is the change the if statement in ReadScope from:

        if (currentTimestamp > ScopeTimestamp)

    to:

        if (currentTimestamp >= ScopeTimestamp)

    This would work as well, I believe, though I have not tested it.  I am not sure there is an advantage to NOT resetting the ScopeTimestamp.  It is possible that the code expects the value to remain set, but I haven't found that to be the case yet.

    Unfortunately, if you decide to persue this fix on your own, you will have to extract quite a bit of code (using Reflector, for example) in to your own code space in order to make that single line fix.  To get to that line, you need about 7 classes (and several enums) that are internal and thus generally out of reach.

    I am not comfortable posting my code since (a) I am not certain of the fix yet, and (b) the code I modified is copyrighted, so I can't really just post it.  However I did write to MS about the fix, and I know people on the Sync team also monitor this forum, so I am hopeful this will get some interest.

    Feel free to ask questions, and like I said earlier, all feedback is welcome.

    -Kevin

    Tuesday, March 23, 2010 7:53 PM

All replies