none
What is supposed to happen if a service throws during static initialization? RRS feed

  • Question

  • In my service on the head node, I have some initialization occurring in the static initializer. I'm testing it with a job that has a single task.

    When everything is configured correctly, the job starts, task starts, the task finishes and the job finishes (ie everything appears to work as expected).

    When I arrange for the static initialization in the service to fail, what appears to happen is that the head node continually re-starts the task, forever, until I cancel the job on the head node.

    Each task shows the following "result": Broker shut down this service host when shrinking session's resource allocation.

     

    Is this to be expected? Is there another (better) way to initialize the service environment? 


    Friday, July 8, 2011 2:53 PM

All replies

  • Hi,

    If the service static initialization casues the task to fail and requeue, the job will eventually fail after ~1000 requeued tasks failed.

    Another possible way to initialize the envrionment is to use node preparation task, depends on what initialization it is.

    Sunday, July 10, 2011 5:19 AM
  • If user code throws in static constructor, the task is considered as failed. That's why broker shuts it down and try to restart it (on the same or another node).

     

    What's your expected behavior if initialization failed?

    Monday, July 18, 2011 5:33 AM
  • Our deployed service needs to dynamically load some assemblies and do a licence check, and it seemed that a static constructor on the service itself would be a good place to do that work. If we do this, however, and either there are missing assemblies in the deployment or the licence check fails, the job appears to never end (from the client's perspective), and the error log grows on the server.

    I guess I was expecting the task/job to immediately fail and the client be able to capture the details of the FileNotFoundException or LicenceException.

    What we're now doing instead is lazily initializing inside the service operation method, and that seems to give the behaviour we need, but I'm now curious as to why failing in this manner is not also considered as a "failure-retry" situation by the broker.

    But I'm admittedly new to HPC, so I guess there's just something fundamental here that I'm missing.


    Monday, July 18, 2011 12:31 PM
  • Indeed, there is "failure-retry" in case of a request failed. It depends on the exception type the code throws. Take a look at the HPC whitepaper about RetryOperationException.

    In your case, however, it seems that you want to immediately fail the task and ban the node?

    Monday, August 8, 2011 7:26 AM
  • yidingz,

    I'm revisiting this issue now due to recent changes in our licensing scheme. You mention the concept of "banning the node". Is there a way for my service (via the static constructor or while processing a task) to programmatically remove from the Job the node on which it's running and have the current task retried? This would allow us to gracefully report on nodes that are incorrectly configured while still allowing the job to complete successfully on the other nodes.

    Thanks




    • Edited by wbradney Thursday, May 31, 2012 6:45 PM
    Thursday, May 31, 2012 3:11 PM