locked
How to properly cancel an SOA HPC job? RRS feed

  • Question

  • We have an SOA HPC service that's working just fine, using Session.CreateSession and a BrokerClient. When we want to cancel a job, we close the BrokerClient and the job "Finishes" when all the tasks currently executing are finished.

    Unfortunately, some tasks take much longer than others to complete, so if the request to cancel comes in while one of these tasks is executing, there's a big delay until the job is actually declared "Finished".

    We've noticed that in HPC Cluster Manager we can manually "Cancel" the job directly, and the job seems to become "Cancelled" almost immediately, killing the long-running task.

    How do we programmatically achieve the same effect from our SOA client? It seems like IScheduler.CancelJob is the right thing to call, but it's not clear how to actually do that via a Session or a BrokerClient. Or is there some other way to "Cancel" an SOA job, other than closing the BrokerClient?



    • Edited by wbradney Monday, May 2, 2016 3:03 PM
    Monday, May 2, 2016 3:02 PM

Answers

  • Hi wbrandney,

    For interactive sessions created by Session.CreateSession(), if it is not a shared session the implicit closing in disposing is with purge equals true, this behavior can be changed by setting Session.AutoClose property to false. If it is a shared session, the implicit closing in disposing is with purge equals false.  For durable sessions created by DurableSession.CreateSession(), the implicit closing in disposing is always with purge equals false, so that we can attach the durable sessions later for getting the responses.

    For Broker Client, disposing it which won't call Close(), or using default Close() which will call dispose, won't implicitly purge the requests/responses at the broker side. So that we can create another broker client with the same client id to retrieve the resonses. If you want to purge the requests/responses at the broker side for a broker client, you need to call BrokerClient.Close with purge parameter explicitly set to true.

    I can understand your requirement for Admin to easily tell which sessions are canceled instead of finished from the job state. Currently no matter the session is closed before all requests processed or after, the job for the SOA session is in a finished state. Admins may only tell if the session is completed (if defined by all requests processed) by comparing job's NumberOfCalls and NumberOfOutstandingCalls properties.

    For your solution to use CancelJob, it is a feasible way to conclude the job in a cancelled state. In the meantime, I suggest you to close the session with explicitly purge after canceling the job, so that the requests/responses at the broker side would be cleaned up from memory or Disk and the related broker resources would be freed in a timely fashion.

    BR,

    Yutong Sun

    • Marked as answer by wbradney Saturday, May 7, 2016 11:37 AM
    Saturday, May 7, 2016 6:31 AM

All replies

  • Hi wbrandney,

    I would suggest to use SessionBase.Close(true) to close a session while deleting the resposne messages and finishing the service job. Please see the API reference at here.

    BR,

    Yutong Sun

    Tuesday, May 3, 2016 9:03 AM
  • This doesn't actually "cancel" the job, though (just "Finishes" it). This is effectively the same as what we're currently doing, which is closing the BrokerClient.

    I think the closer approach would be to create and connect a new Scheduler and call CancelJob(), and I'm testing that now. Let me know if you see any problems with that approach.

    It's surprising to me that true cancellation isn't a first-class concept in either the Session or the BrokerClient.
    Wednesday, May 4, 2016 1:32 PM
  • Hi wbrandney,

    SessionBase.Close() is the formal way to conclude a session. It would call the broker service and broker worker on the broker node to delete the request/response queues and release related resources while finishing the service job. Closing BrokerClient with purge equals true only concludes this one broker client of possible many in a session, and it cannot release the session resources. So it cannot equal to closing a session. If you have some long running requests, SessionBase.Close(true) would effectively finish the session job within the Task Cancel Graceful Period which is a global scheduler setting. To nicely handle canceling long running requests at the service host and client side, there is sample code named EchoService/HelloWorldR2CancelRequests in the Microsoft HPC Pack 2012 R2 SDK and Sample Code. You may check it out.

    As for CancelJob() directly using scheduler API, it is a feasible but rough way of finishing a session. It leave the broker service and broker worker to detect the canceled session job and do the cleanup work later. It would also generate session exceptions for the session client to deal with while it may be sending requests or getting responses. So in general it is not recommended approach.

    The default value for purge parameter is true for Session/DurableSession, so you may choose Close() which equals to Close(true). However for BrokerClient, the default value for purge is false, that is because we suppose users may want to leave the requests/responses in the broker for later use until he/she explicitly purging the messages when closing the broker client.

    BR,

    Yutong Sun

    Friday, May 6, 2016 4:04 AM
  • This is basically what we're doing:

    CancellationToken cancellationToken = ...; ManualResetEvent completedHandle = ...; using (var session = Session.CreateSession(...)) {   var sessionJobId = session.GetProperty<int>("HPC_ServiceJobId");
    using (var client = new BrokerClient<IHpcGridService>(session, ...)) { client.SetResponseHandler<DoGridTaskResponse>(response => { if (cancellationToken.IsCancellationRequested) { return; } /* process result */ ... if (response.IsLastResponse) { completedHandle.Set(); } }); /* submit tasks */ ... cancellationToken.Register(() => { try { var scheduler = new Scheduler(); scheduler.Connect(_clusterName); scheduler.CancelJob(sessionJobId, "cancelled", true); } catch (Exception ex) { /* log error */ } }); WaitHandle.WaitAny(new [] { completedHandle, cancellationToken.WaitHandle }); /* end-of-job processing */ ... } }

    So we're (implicitly) closing (disposing - I assume they're the same thing) both the BrokerClient and the Session on cancellation. If we don't call CancelJob, however, an admin looking at the HPC job list cannot really tell at-a-glance that the job was actually cancelled by the user, which was what we were trying to achieve (ie. cancelled jobs end up in the same state in cluster manager regardless of whether they were cancelled through cluster manager or through the client application).

    If you'd let me know if you think there is anything fundamentally dangerous or problematic with this approach I'd appreciate it.

    Thanks,







    • Edited by wbradney Friday, May 6, 2016 12:55 PM
    Friday, May 6, 2016 12:45 PM

  • Note also that we're not particularly concerned about _graceful_ cancellation of tasks in progress - we're happy to kill them stone dead when a user cancels a job, but we need to cancel quickly and clearly reflect the cancellation in the job status.
    Friday, May 6, 2016 1:10 PM
  • Hi wbrandney,

    For interactive sessions created by Session.CreateSession(), if it is not a shared session the implicit closing in disposing is with purge equals true, this behavior can be changed by setting Session.AutoClose property to false. If it is a shared session, the implicit closing in disposing is with purge equals false.  For durable sessions created by DurableSession.CreateSession(), the implicit closing in disposing is always with purge equals false, so that we can attach the durable sessions later for getting the responses.

    For Broker Client, disposing it which won't call Close(), or using default Close() which will call dispose, won't implicitly purge the requests/responses at the broker side. So that we can create another broker client with the same client id to retrieve the resonses. If you want to purge the requests/responses at the broker side for a broker client, you need to call BrokerClient.Close with purge parameter explicitly set to true.

    I can understand your requirement for Admin to easily tell which sessions are canceled instead of finished from the job state. Currently no matter the session is closed before all requests processed or after, the job for the SOA session is in a finished state. Admins may only tell if the session is completed (if defined by all requests processed) by comparing job's NumberOfCalls and NumberOfOutstandingCalls properties.

    For your solution to use CancelJob, it is a feasible way to conclude the job in a cancelled state. In the meantime, I suggest you to close the session with explicitly purge after canceling the job, so that the requests/responses at the broker side would be cleaned up from memory or Disk and the related broker resources would be freed in a timely fashion.

    BR,

    Yutong Sun

    • Marked as answer by wbradney Saturday, May 7, 2016 11:37 AM
    Saturday, May 7, 2016 6:31 AM
  • Great, thanks very much for your help Yutong.
    Saturday, May 7, 2016 11:37 AM
  • Great, thanks very much for your help Yutong.
    Saturday, May 7, 2016 11:37 AM