locked
MSS throttle issues RRS feed

  • Question

  • Over the weekend we were making some outbound calls.  We had the throttle set at 40 and MSS was doing fine throttling the calls at 40.  However, as soon as we got a couple calls in the queue for another application, MSS started throttling at 80 with 78 calls for the first application and two for the second. 

     

    We had MSS set to throttle at 40 because that's what our SIP peer can handle.  I thought that was the purpose of the throttle, so this seems like a bug to me.

     

     

    The next issue concerns the use of the SipPeerException.ShouldMssThrottleCalls property.  This is can be very useful, but sometimes we will get TelephonyException and not a SipPeerException and I was wondering if there was also a way to have MSS adjust the throttle for other types of exceptions like a TelephonyException.  I understand the difference between the types of exceptions, but occasionally we will get a TelephonyException if the SIP peer is down and the request times out.  In this case it would be very useful to be able to have MSS adjust its throttle.  Is there any way to do this?

     

     

    The last issue I had was with how MSS throttles calls when there is a SipPeerException that has ShouldMssThrottleCalls set to true.  Our throttle was set to 40 and we were making calls.  We then got an exception for which we want MSS to adjust the throttle down.  So we did this by setting ShouldMssThrottleCalls to true.  I was watching the Active Application Instances performance counter at the time.  MSS immediately stopped submitting new calls and waited for the existing calls to finish until there were about two left.  Then it submitted more until there were about 16 and then it let those all finish until there were just a few left.  Then it submitted more until there were about 16 and left most of those finish.  It continued this for about 10 minutes and then gradually started going back up to 40 calls. 

     

    Prior to this, MSS had been keeping about 40 calls in progress at all times.  My question is, when you set ShouldMssThrottleCalls to true, is there any way to just have MSS lower its throttle by a couple and keep the calls steady at the lower throttle.  As it is, it slows down drastically and then takes about 15 minutes to build back up to the normal throttle. 

    Monday, August 20, 2007 4:45 PM

Answers

  • Sorry for the delay in responding, I missed this post.

     

    When you say "throttle set at 40" what do you mean? 

    If you are using "Admin Console -> Servers -> <your server> Properties -> Advanced -> Call throttling -> Maximum outgoing", this should apply across all applications on a single machine, and the behaviour that you observe is unexpected (and I have failed to reproduce it myself).  If you have 2 applications on 2 separate machines then the behaviour you observe is expected because there is no communication between the 2 instances of Speech Server. 

     

    If you are using your own throttling mechanism using a static counter, then again this is the behaviour you'd expect, since the 2 applications live in separate app domains and so each have their own static counter. 

     

    SipPeerException.ShouldMssThrottleCalls. 

    I'm afraid that this is only available for SipPeerExceptions.  I have noted this unfortunate restriction.

     

    "when you set ShouldMssThrottleCalls to true, is there any way to just have MSS lower its throttle by a couple and keep the calls steady at the lower throttle"

    The algorithm employed is as follows: when you set ShouldThrottleMsmqCalls to true, Speech Server reads the current number of active outgoing calls for this application instance, and sets the upper limit to this.  This limit is maintained for 15 mins, after which the upper limit is increased by 1 every 10 secs.  If, during this period, you set ShouldThrottleMsmqCalls to true again, the process starts again with the current number of active outgoing calls for this application instance. 

    An application instance counts towards this limit from the moment a message is noticed in the queue to when it disconnects (or completes with no connection).  When Speech Server notices a message in the queue, it does some resource allocation, then retrieves the message & incrementes NumApplicationInstances.  NumOutboundCalls is then incremented when the session connects.  Assuming that the server is not under load from inbound calls, the resource allocation should be quick

    Finally, it will take up to 10 secs following a disconnect before Speech Server looks for another message in the queue.  This is because watching the current number of active outgoing calls is done on a 10 sec timer.

     

    In your example, it looks like Speech Server is throttling at 16 and the "let those all finish until there were just a few left" is due to the 10 sec timer delay.

     

    There is no way to alter this algorithm, however my suspicion is that you are observing an extreme behaviour due to short call lengths relative to the time it takes to establish a call.  If your call lengths were longer, the num existing outbound calls would be higher, and the 10 sec delay less significant.  Having said this, you raise an interesting blemish that Speech Server should really be including the number of sessions trying to establish a connection, not just the ones connected.

    Monday, September 3, 2007 12:45 PM

All replies

  • Anyone?

     

    These are significant issues for us and we'd appreciate some help with them.

     

    Friday, August 31, 2007 9:51 PM
  • Sorry for the delay in responding, I missed this post.

     

    When you say "throttle set at 40" what do you mean? 

    If you are using "Admin Console -> Servers -> <your server> Properties -> Advanced -> Call throttling -> Maximum outgoing", this should apply across all applications on a single machine, and the behaviour that you observe is unexpected (and I have failed to reproduce it myself).  If you have 2 applications on 2 separate machines then the behaviour you observe is expected because there is no communication between the 2 instances of Speech Server. 

     

    If you are using your own throttling mechanism using a static counter, then again this is the behaviour you'd expect, since the 2 applications live in separate app domains and so each have their own static counter. 

     

    SipPeerException.ShouldMssThrottleCalls. 

    I'm afraid that this is only available for SipPeerExceptions.  I have noted this unfortunate restriction.

     

    "when you set ShouldMssThrottleCalls to true, is there any way to just have MSS lower its throttle by a couple and keep the calls steady at the lower throttle"

    The algorithm employed is as follows: when you set ShouldThrottleMsmqCalls to true, Speech Server reads the current number of active outgoing calls for this application instance, and sets the upper limit to this.  This limit is maintained for 15 mins, after which the upper limit is increased by 1 every 10 secs.  If, during this period, you set ShouldThrottleMsmqCalls to true again, the process starts again with the current number of active outgoing calls for this application instance. 

    An application instance counts towards this limit from the moment a message is noticed in the queue to when it disconnects (or completes with no connection).  When Speech Server notices a message in the queue, it does some resource allocation, then retrieves the message & incrementes NumApplicationInstances.  NumOutboundCalls is then incremented when the session connects.  Assuming that the server is not under load from inbound calls, the resource allocation should be quick

    Finally, it will take up to 10 secs following a disconnect before Speech Server looks for another message in the queue.  This is because watching the current number of active outgoing calls is done on a 10 sec timer.

     

    In your example, it looks like Speech Server is throttling at 16 and the "let those all finish until there were just a few left" is due to the 10 sec timer delay.

     

    There is no way to alter this algorithm, however my suspicion is that you are observing an extreme behaviour due to short call lengths relative to the time it takes to establish a call.  If your call lengths were longer, the num existing outbound calls would be higher, and the 10 sec delay less significant.  Having said this, you raise an interesting blemish that Speech Server should really be including the number of sessions trying to establish a connection, not just the ones connected.

    Monday, September 3, 2007 12:45 PM
  • Hi,

     

    Thanks for replying.

     

    We set the MSS throttle on one machine to 40.  We had calls going out on one application and then when calls came in on another application the throttle doubled to 80.  This was on one machine.  However, we haven't seen this happen in a while so I'm thinking that a bug in our SIP peer may have been a contributing factor.  If I see this happen again, I'll repost and try and get some steps to reproduce.

     

    Is there any sort of workaround to have MSS throttle for exceptions other than SipPeerExceptions?  As it is now, if our SIP peer goes down, MSS will burn through hundreds of calls in minutes. 

     

    It would be nice if in a future release the throttle algorithm would include the number of connecting calls rather than just the connected calls.  It makes more sense because the number of ports one has available with a SIP terminator or gateway includes connecting calls, not just connected calls. 

    Tuesday, September 4, 2007 2:09 PM
  • I can think of 2 workarounds, neither ideal; (1) implement your own throttling logic or (2) use reflection to trigger Speech Server throttling.

     

    (1) You can stop Speech Server pulling messages from the queue by using WMI to set either of the following properties and invoke RefreshSettings:

    i) <your Application>.NotificationMessageQueue to String.Empty

    ii) <your trusted Sip Peer>.AllowOutboundCalls to false

     

    When you receive the TelephonyException, set either of these, start a timer, and reset the value when the timer fires.  You could put the timer on a static object or within an individual application instance.  With a static timer you need to ensure that, if a recycle occurs whilst the timer is active, that the value gets reset; use global.asax to wire up Application_Start or Application_End to check this.  If you use an individual instance, that problem is easy (make sure you reset the timer in ((IHostedSpeechApplication)Class1).Stop, however you need to ensure that if multiple instances detect the TelephonyException, only 1 starts the timer.  Also note that, when you reset the value, you may get multiple application instances attempting to place a call simultaneously which, if the SipPeer is still unavailable, will all fail.

     

     

    (2) Use Reflection to invoke the internal method MsmqListener.OnGatewayCapacityReached() (in Microsoft.SpeechServer.Core.dll).  This is identical to setting ShouldThrottleMsmqCalls on SipPeerException.  You must do this within the OpenCompleted callback.  The downside to this is the use of reflection to access an internal method - there's no guarantee that this will continue to be valid in future releases.

     

     

     

    Wednesday, September 5, 2007 1:10 PM
  • I have an explanation for the 40 to 80 jump.  Were you collecting performance counters at the time?  Did it coincide with a process recycle?  And did it drop back to 40 after a while?

     

    The max outgoing calls throttle is applied by SesWorker; when this is recycled each instance applies the limit separately, so there is a period whilst the old instance is winding down when there will be up to 80 simultaneous channels.  As calls associated with this old instance complete, it'll gradually drop back down. 

     

    What I think happened for you is that a SesWorker process recycle coincided with your 2nd application receiving messages.  If you have .etl logs, NT Application Event logs or performance counters for the period, they would indicate whether this is the case.

     

    Did you end up draining the message queue when this happened, or were you able to set ShouldThrottleMsmqCalls to prevent it?
    Wednesday, September 5, 2007 7:38 PM
  •  

    Hi Anthony,

     

    Thanks for the detailed responses.  It's good to get some insight into what's going on under the hood.

     

    I was watching perfmon at the time, and it did fall quickly back under 40, but I'm not sure if it coincided with a recycle.  That makes sense.  I also built a throttle into my sip proxy so that new invites get a busy and result in the MSS throttle triggering, so that also explains a rapid dropoff after the jump over 40.

     

    We have a bit of a catch-22 when we are reaching the throttle limits.  If I let MSS hit its throttle, we have a problem with supervised transfers (sectrean has posted about this here previously), since there are no open channels for the transfer.  But if I let the proxy throttle get hit first, MSS slows down so much that calls really back up.

     

    The throttles coded into my middleware layer are designed more on a per-client basis, depending on agreements about call volume, so I'd rather not change them too much.

     

    The easy, but expensive solution is just to have way more capacity than I need.  Once I get to the point where I'm loadbalancing between MSS instances, I will build in a bit more control from the middleware to keep from over-taxing any one box, but it would be nice to push the limits and maximize hardware without worrying too much when we peak above the throttle occasionally.

     

    Overall we've been pleased with scalability of 2007.  It's much better than 2004, and call sound quality is improved.  Do you have any figures on what 2007 can handle (theoretically)?  I wouldn't want to run my 2004 machines over 30 simultaneous calls, but I'm already well over that with 2007, and that's with all logging and tracing on in mss and the proxy.  What's the limit?  100?  200?  Is it purely a matter of resources?

     

    Just one more related question: We've noticed that when you drop new code onto a machine while calls are active, those calls get dropped.  I was surprised by this, since with ASP.NET if you update something like web.config, existing requests are allowed to finish and new requests are seamlessly handled by the new code.  Is this the intended behavior?  Of course it's not exactly best practice to hotfix production code, but sometimes you find a bug in a workflow and you just want it fixed asap.

     

    Thanks,

     

    EZB

    Wednesday, September 5, 2007 9:15 PM
  • "We've noticed that when you drop new code onto a machine while calls are active, those calls get dropped.  "

    As you notice, dropping new code in triggers a recycle.  Existing requests are allowed to finish, however by default ASP only gives them 30secs.  You need to add the following to your web.config to give them 5 minutes (or whatever value is appropriate for your app.  You also need to make sure that the shutdown timeout for your application pool is 1 minute longer.  The Speech Application application pool is set to 6 mins.

     

    <configuration>
      <system.web>
        <hostingEnvironment shutdownTimeout="300" />
      </system.web>
    </configuration>

     

    Do you have any figures on what 2007 can handle (theoretically)? 

    There's a document here: http://download.microsoft.com/download/a/4/4/a441d741-ff76-40b2-8f7c-c1b799fd94f0/Capacity planning for OCS Speech Server 2007 Deployments.docx.  You've probably seen the 2004 version of this doc.  As with 2004, it really does depend on your app, so you need to do your own measuring.  Speech Server employs no hard limit, so yes, the limit is purely a matter of resources. 

     

     

    But if I let the proxy throttle get hit first, MSS slows down so much that calls really back up.

    The scenario that MSMQ is designed for is when messages are dumped into the queue periodically for Speech Server to process as fast as it can safely do so.  We assume that the odd MakeCall failure is ok (ie. that the application will either retry the MakeCall or the backend will put a new message in the queue), but that it shouldn't burn the queue dry.  In addressing the latter we assume that it is better to be sluggish at getting back to capacity than sacrifice too many messages trying to get back aggressively. As such, the throttling algorithm does assume that there is some flexibility in the system for a temporary drop in #channels.

     

    From your description, it sounds like you have control over the SipPeer and the backend that puts messages into the queue.  Given this, can you add communication between the SipPeer and the backend?  If so, you can beat any general purpose throttling algorithm that Speech Server could employ.  The solution I have in mind is:

    1. Use http POST to the .speax to trigger the outbound application instead of MSMQ.
    2. Have the SipPeer keep the backend up to date with the current number of outbound channels.
    3. Only issue POST requests from the backend when the SipPeer has capacity.
    4. Don't set a maximum number of outgoing calls on Speech Server

    This also had the advantage that you can take into account inbound applications that may wish to transfer, and ensure that there is capacity at the SipPeer to handle them.  You also don't need to worry about detecting a non-existant SipPeer from the application because the backend can know this itself.  Note that Speech Server will reject http requests if it determines that it's currently too heavily loaded (it watches various latencies to determine this).

    Thursday, September 6, 2007 10:54 AM
  • Thanks Anthony!  This is a good thread.  I have a lot of new information and plenty of changes to consider.

     

    EZB

     

    Thursday, September 6, 2007 12:30 PM