locked
Controlling Batch Size - MemoryDataCacheSize less than Useful? RRS feed

  • Question

  • From the documentation:

    RelationalSyncProvider.MemoryDataCacheSize Property

    Gets or sets the maximum amount of memory (in KB) that Sync Framework uses to cache changes before spooling those changes to disk.

    So I was setting this value to 500 which meant that I was typically seeing 300-400kb batch files. Fine so far.

    Well, I have this one sync that has a large number of records in it. No problem there either. Except....

    Most of these records are in the several kb range. However, one of them is about 2mb. This breaks the sync because a 2mb record won't fit into a 500kb limit.

    This means from a practical basis that MemoryDataCacheSize must be larger than your largest record. Now if you have a record that might (just might) contain a blob of binary data, then you need to account for this in your setting of MemoryDataCache size.

    I have increased my setting to 3mb. But this is really unsatisfactory for two reasons:

    a) I cannot be sure that this value is big enough for the future.

    b) This increases the size of all of my batches to this new value, which I do not want.

    So, it seems to me that the functioning of the MemoryDataCacheSize is not well thought out.

    IMO, it should be a limit to the size of a batch, but it must also allow for a single record to exceed this. Or at least there should be a property or setting that says, for example:

    Provider.ExceedMemoryDataCacheSizeForSingleRow = true;

    Or am I missing something?

    Saturday, March 27, 2010 2:57 PM

All replies

  • if you look at it, blobs is more of an exception than the rule. so the memory based batching is useful for most scenarios but not all (which is true for SyncFx itself, covers a lot of common scenarios, but less than perfect for everything.)

    Mahjayar provides some backgrounder around the memory batching in his blog at : http://blogs.msdn.com/mahjayar/archive/2009/09/16/msf-v2-ctp2-deep-dive-memory-based-batching.aspx

    But i agree, being able to tell batching to let pass a row exceeding the batch size would be nice. i had a project where we had to put the tables with blobs in a separate syncgroup without batching.

     

    Sunday, March 28, 2010 2:09 AM
  • June,

    Thank you for the very informative link. A couple of quotes from it:

    The runtime will try to stuff a batch with data as long as the in-memory size of those data does not exceed the specified batch size by 10%. This is to guard against sending many number of under populated batches.

    and:

    There are times when a single row would be greater than 110% of the specified data cache size. In such cases the runtime errors out with an exception message that lists the offending table and the primary key of the row that is too big. Users would need to increase the data cache size to accommodate that row.

    The first one shows that Microsoft has picked an arbitrary percentage above the number that one specifies for a batchsize. There is no justification for that. If I want a batchsize of a particular size then it is should be my decision what size I get not Microsoft's. They could document that (in their experience) selecting 10% more than I think I want will fill the batches better. But it should be my decision. I should not have to specify 454 if I want 500 as my maximum.

    The latter one goes to the heart of my issue. One of the key things for a developer to not do, IMHO, is make choices like this on behalf of the user. Especially in a generalize product. This functionality must surely have been debated during its design (and if it wasn't then why not?). I think the design is, being diplomatic, not optimal. You can't just change the size of a batch in a large population of clients in a production environment. This is a game breaker. It can break your production environment for every client. Ignoring the exercise of doing a full population change of batch size, where for example, your company does not even own those clients, how can you tune your environment/batchsize under these sorts of rules?

    In my instance, I cannot change the data, the application to which sync is being applied is not new. I have a table of 200,000 records that is shared by all clients (one way sync to clients). Almost when it is done, a couple of records exceeds my original 500kb (er, I mean 550kb) limit. Now I have upped the batch size to 3000kb which solves my problem, but I really didn't want to make the batches this big. Besides, once it is deployed, let's say a record is changed and it is now 4mb. Just one record. Now the sync is broken for everybody. Maybe I can do non-batched sync for this latter scenario, but not for the full table sync. Differentiating

    In closing, please realize that my comments above are directed at Microsoft and not you. I am disappointed in the Microsoft solution here.

    I know there are ways around this, but there shouldn't need to be. As I said, not optimal.

    So, Microsoft, you have often asked for suggestions for the next release. here you go:

    Add the following property to RelationalSyncProvider:

    public bool ExceedMemoryDataCacheSizeForSingleRow { get; set; }

    Gets or sets the directive that the MemoryDataCacheSize can be exceeded for a single row during the creation of a batch. When set to false, MemoryDataCacheSize is used as the upper limit for batch size; when set to true, MemoryDataCacheSize is used as the upper limit for batch size, except when a single record will not fit in a batch, in which case, the batch size will be exceeded to house this single record change.

    Steve

     

    Monday, March 29, 2010 1:19 PM
  • no offense taken :) as i have said, i'd love to see a setting to let pass an offending row that exceeds the batch size too.

    Plus, the ability to have each batch on its own transaction rather than having all batches in one transaction (w/c I think is being considered already)

    Monday, March 29, 2010 8:26 PM