No response to SOA calls RRS feed

  • Question

  • We have an application that creates SOA jobs on HPC 2012 R2 clusters, then make calls to those jobs to do calculations. The application generally works fine, but on one specific testing server, if I leave the application service idle for a while, like 30 minutes, then trigger a SOA call to HPC 2012 R2, somehow the call doesn't reach the SOA job it supposed to go.  Ff I cancel the SOA jobs on HPC 2012 R2 cluster and trigger another SOA call, my application reports error that, it looks like my application has maintained a valid connection to HPC grid. 

    This problem only occurs on one server.

    Is there a way to trace what HPC head node or broker does to SOA call? 



    Thursday, November 24, 2016 11:41 PM

All replies

  • Hi Dan,

    You may enable SOA message level tracing (see How-to) to obtain a general view of how the SOA requests/calls sent from the client to the broker worker on the broker node are dispatched to the service hosts on the compute nodes.

    Please be noted that a SOA service has different timeouts e.g. ClientIdleTimeout, SessionIdleTimeout, if the client keeps idle beyond the timeout, the client could be disconnected, if the session keeps idle beyond the timeout, the session could be automatically closed or suspended (for durable session) and the service job would also be finished. If you manually cancel the service job, the SOA session which depends on the service job could also be closed. You cannot send a request (trigger a SOA call) to a closed session because the broker worker is already unloaded.


    Yutong Sun

    Friday, November 25, 2016 7:56 AM
  • Thank you for your reply Yutong.

    ClientIdleTimeout and SessionIdelTimeout have been set to very large value, I don't think they will cause any issue.

    I have tried to turn on SOA message level tracing for my job, and it looks like the job didn't receive the message at all. Is it possible that the head node didn't send message to my SOA job after it received it? Is there a way to trace it?


    Monday, November 28, 2016 7:56 PM
  • Hi Dan,

    Once you enabled the SOA message level tracing, you may check the message detail from the HPC Cluster Manager by Job Management -> SOA Jobs -> Right click on the SOA job in the job list and choose "View Message Details". In the pop up dialogue, you can check the status for each message, e.g. when the request is received by the broker, when the request is dispatched and to which target machine, if the request is processed successfully and when the response is received. If there is any exception happens, there would be exception details.

    If there is any error when processing messages in the broker worker, we need to collect the detailed broker worker trace files on the broker node to do further investigation. All the broker worker trace files are by default located under %CCP_DATA%LogFiles\SOA folder on all the broker nodes with the name like HpcBrokerWorker_*.bin. They could be in a large number, so you may run 'net stop hpcbroker' to stop the broker service, delete all the existing broker worker trace files, then run 'net start hpcbroker' to start the broker service, repro the issue, after that copy and zip the HpcBrokerWorker_*.bin files and send the zip to me via yutongs@microsoft.com for further investigation.

    Please note the head node may not necessarily be a broker node, and there could be multiple broker nodes in the cluster each handles one SOA session at a time. You may run 'job view <jobId> /detailed | findstr Broker' to see on which broker node the SOA session is run.


    Yutong Sun

    Wednesday, November 30, 2016 6:32 AM
  • Thanks a lot Yutong!

    I did call up the screen you attached, and I couldn't find the new message I sent, looks like it didn't reach there.

    I also used hpctrace to check log files under %CCP_DATA%LogFiles\SOA, the files I checked were all empty. The bad thing is that our grid is shared by other users and it's busy all the time, I can't stop the broker (we have one broker only) to clear the old files. 

    I'll do more investigate later to see if I can find some more info. Thanks for the help!


    Thursday, December 1, 2016 6:18 PM