locked
How is batch processing supposed to work? RRS feed

  • Question

  • I'm back again with yet another question.  I am getting close to finishing my prototype, but now I am having trouble with batch processing.

    Here is the short version of my question:  how is batch processing supposed to work?  Specifically, when a client application is putting together a lot of data to send to the server, does it first build the batch file locally and then send the data to the server?  If not, what is the expected behavior?

    The Longer Version

    My test configuration consists of two client databases running on my PC and a server database running on a test server.  I have a SyncService running on the test server.  My prototype implementation is based on the WebSharingAppDemo-SqlProviderEndToEnd demo application.  Smaller syncs are working fine, but larger syncs that force batch processing are failing and the reason they are failing is because the ApplyChanges method in my sync service isn't finding any batch files to process.

    The specific test case I am trying to execute is that I have created 100,000 records in one client database (call it "client1").  I want to sync these records to the server database and then sync them to a second client database (call it "client2").  Once that has been successful, I intend to delete these same 100,000 records in "client1", sync the deletions to the server database, and then sync them to "client2".

    What I am seeing is that batch files are being built in the batching directory for "client1" but never getting moved/copied/created in any batching directory associated with the server.  This looks suspiciously like a coding error on my part.

    My real question is this: how is batch processing supposed to work?  Specifically, should "client1" be creating batch files locally and then sending these batch files to the server?  Or should the data simply be sent to the server and it can create its own batch files?  Is there a description of how this should work written down anywhere?

    On a side note, I had a great deal of difficulty getting my sync service to work on the test server using the WSHttpBinding.  I ended up switching to net.tcp binding and that was significantly easier to use.  I have not seen any mention of using this protocol on these forums.  Is there some reason to not use net.tcp binding?

    Thanks for the guidance and assistance.  I am very close to finishing my prototype and am reasonably confident that we will be able to use MSF in our application. 

    Tuesday, April 27, 2010 1:26 PM

Answers

  • The problem in your config is this line:

    <readerQuotas maxArrayLength="100000"/>
    This means max array size to be 100K which is far smaller than the 867K batch file size serialized in a array. You can change it to be same as maxReceivedMessageSize, ie. 10485760 which is about 10MB.

    If you apply the similar changes mentioned above to wsHttpBinding, it should also work.

    You are right, the current webSharing sample model doesn't work with basicHttpBinding because it doesn't support wcf session. Session must be supported in some way for sync to work, such as in this webSharing sample with wcf session. In other cases, web server storage (such as windows azure storage) can be leveraged to support session. We will be working on a sample to show how that works, which is capable of using basicHttpBinding.

     

     

    • Marked as answer by PuzzledBuckeye Wednesday, April 28, 2010 5:24 PM
    Wednesday, April 28, 2010 5:00 PM
    Answerer

All replies

  • When you are using batching for the SqlSyncProvider, the batch files are generated on the sending side. The batch files will be either sent/uploaded to the server or downloaded from the server through the WCF operation UploadBatchFile or DownloadBatchFile. These operations are naturally tied with ApplyChanges/ProcessChangeBatch or GetChangeBatch/GetChanges operations.

    When sending batch files to the service, client proxy Provider ProcessChangeBatch is called, it will call serviceProxy.UploadBatchFile if needed, then call service Proxy.ApplyChanges, service knows that the batch file has already been uploaded, it will change the DbSyncContext's batch file to (service) local batch file path.

    When receiving batch files from the service, client proxy provider GetChangeBatch is called, it will call serviceProxy.GetChanges from service, then call serviceProxy.DownloadBatchFile if needed, then correct the DbSyncContext's batch file to local batch file path.

    Using Http is more general approach than net.tcp because most firewalls allow that, that's why the sample is using that.

     

    Tuesday, April 27, 2010 5:45 PM
    Answerer
  • Let me describe my problem in a little more detail now that I understand this better and then ask a couple more questions.

    When I request the sync operation, I can see initialization activity going on between the client application and the service and we quickly get to the point where the sync framework calls ProcessChangeBatch() on my provider proxy.  When this occurs, the following sequence of events takes place:

    1. The DbSyncContext object passed into ProcessChangeBatch() indicates that we are doing batching.
    2. If I pause the client application at this point via a debugger breakpoint, I can see that there are two files in the client's batch directory, a "<guid>.batch" file and a file named "SyncBatchHeaderFile.sync".  The ".batch" file is about 847 KB in size.
    3. The ProcessChangeBatch() method then calls the "HasUploadedBatchFile()" method on the proxy.  The service receives this call and correctly returns "False" since it does not have this batch file yet.
    4. The ProcessChangeBatch() method then streams the contents of the ".batch" file into a Byte array and calls the proxy's UpLoadBatchFile() method to send the Byte array over to the service.
    5. The ProcessChangeBatch() method then calls the proxy's ApplyChanges() method to process the batch file.

    The problem is occuring on step 4: even though the call to proxy.UploadBatchFile() "succeeds" in the sense that the call returns and does not throw an exception, the corresponding method in the service is never called (all of my service methods log their entry and I have placed a breakpoint on first line of the service's UploadBatchFile() method so I know it is not getting called).  Therefore, the service never copies the Byte array to its batch file directory and the subsequent call to ApplyChanges() causes the service to throw an exception because it has not actually received the batch file.

    On an earlier test, I had messed up the buffer sizes on the NetTcpBinding and the call to proxy.UploadBatchFile() would actually deadlock until the wait timer expired.  Once I fixed that, the call allegedly succeeds, but no data is actually being sent over.  This problem only occurs during batch process, which suggests that there is something wrong with either the way I have coded the UploadBatchFile() call or there is something wrong with the way I have configured my NetTcpBinding.

    Here is the code (Visual Basic) for the UploadBatchFile() call.  I have removed error checking and logging, etc.:

    Dim fileName As String = New FileInfo(context.BatchFileName).Name
    Dim peerId As String = context.MadeWithKnowledge.ReplicaId.ToString()
    
    Dim stream As New FileStream(context.BatchFileName, FileMode.Open, FileAccess.Read)
    Dim contents() As Byte = CType(Array.CreateInstance(GetType(Byte), stream.Length), Byte())
    
    stream.Read(contents, 0, contents.Length)
    Me.proxy.UploadBatchFile(fileName, contents, peerId)
    
    context.BatchFileName = fileName
    
    Dim stats As SyncSessionStatistics = Me.proxy.ApplyChanges(resolutionPolicy, sourceChanges, changeDataRetriever)
    

    Here is the configuration of my NetTcpBinding (Note that I am NOT a protocol expert):

        <bindings>
          <netTcpBinding>
            <binding name="noSecurityNeeded"
                     closeTimeout="00:30:00"
                     openTimeout="01:00:00"
                     receiveTimeout="01:00:00"
                     sendTimeout="01:00:00"
                     transferMode="Buffered"
                     maxBufferPoolSize="104857600"
                     maxBufferSize="10485760"
                     maxReceivedMessageSize="10485760">
              <reliableSession enabled="true" />
              <security mode="None" />
              <readerQuotas maxArrayLength="100000"/>
            </binding>
          </netTcpBinding>
        </bindings>

    The actual file size of the .batch file is 867,101 bytes which seems like it should easily fit into a message buffer size of 10,485,760.

    As far as using the net.tcp protocol is concerned, I have very little knowledge about protocols in general.  What I want is a transport mechanism that is reliable, efficient, and easy to configure.  I found WSHttpBinding to painful to configure and got an error when I tried to use basicHttpBinding because it doesn't support SessionMode, which is required according to the original ISync contract.  NetTcpBinding meets all of my objectives and has the added benefit that our existing sync capability already uses this protocol, so we have already opened a hole in our firewall to support.  However, if there is something better I can use, I am open to the suggestion.

    So, my question is this: do I really need to support SessionMode?  In our environment, a sync request could take over an hour (right now I am guessing our worst case could take 4.5 hours) and we will have multiple users syncing at the same time.  This suggests to me that we want SessionMode, which is what was specified in the original WebSharingAppDemo.  But if this is not required, maybe I could switch to basicHttpBinding.

    Again, I really appreciate any clarification, advice, or insight you can offer.

     

    Wednesday, April 28, 2010 12:41 PM
  • The problem in your config is this line:

    <readerQuotas maxArrayLength="100000"/>
    This means max array size to be 100K which is far smaller than the 867K batch file size serialized in a array. You can change it to be same as maxReceivedMessageSize, ie. 10485760 which is about 10MB.

    If you apply the similar changes mentioned above to wsHttpBinding, it should also work.

    You are right, the current webSharing sample model doesn't work with basicHttpBinding because it doesn't support wcf session. Session must be supported in some way for sync to work, such as in this webSharing sample with wcf session. In other cases, web server storage (such as windows azure storage) can be leveraged to support session. We will be working on a sample to show how that works, which is capable of using basicHttpBinding.

     

     

    • Marked as answer by PuzzledBuckeye Wednesday, April 28, 2010 5:24 PM
    Wednesday, April 28, 2010 5:00 PM
    Answerer
  • Thanks for the information.  About 19 minutes after you posted I had figured out that the maxArrayLength setting was in fact my problem (I used WCF event tracing to capture the traffic and it output an exception that indicated that the setting is wrong.  I was in the process of trying to figure out what to set it to when I decided to take a quick peek at this forum to see if anything had been added to my post.  Fortunately, you responded :).

    This should get me up and running again and either well on my way to finishing this or I will simply run into the next difficulty. 

    I really appreciate the response.  Thanks again.

     

    Wednesday, April 28, 2010 5:24 PM