Questions regarding job submission and failover RRS feed

  • Question

  • Dear all,

    Can anybody answers below questions of mine? Please help me here.

    1. In HPC, how a worker node picks up the job assigned to it? Does it pool from the database and retrieve the assign job directly from DB? I would prefer some detailed architecture diagram of HPC if possible.

    2. When user is having a SOA session and during that time node on which service is running failed then what happens? I know system deploy the service on another node & try to get the session running but (a) Will HPC consider the job's resource requirements while finding another node? (b) What if there is no resource available? What happens.

    3. I am running a .NET executable inside one of my job's tasks. I want to gracefully stop this task and .net process associated with it. How to do it? Is there any way I can pass a graceful stopping command from HPC to this task's executable process. 



    Puneet Sharma

    Thursday, March 2, 2017 2:30 AM

All replies

  • I try to answer the questions you have but may not what you want as I did quite get what problem you're facing when asking these questions.

    1. Jobs are task containers and resource are allocated to job (To meet job's min/max). The main logic is as below:

    1. And there is a policy engine in the scheduler to determine the resource allocation (submission time, job priority, job setting, scheduling mode, etc will impact the resource allocation);
    2. When the job allocated with min resource, it will go to "Running" from "Queued", when the job is running, the scheduler will then assign core resource to the tasks inside a job; And then the running job will be managed by a jobMonitor instance to deal with "growing more resources" or "shrink resource from the job", etc; More importantly, a task dispatcher engine will be started by the jobMonitor
    3. The task dispatcher then dispatch all the tasks that resource assigned, a taskMonitor instance will be instanced to manage the whole life of the running task
    4. Job/Task/ResourceAssignment State information will be updated to the DB (As a transcation) so that you can recover from a fail-over

    2. SOA is more complicated as the request/response are managed by broker node. The scheduler respects the SOA job requirement (The min-max requirement and other settings) and starts the SOAServiceHost on allocated resource and that's all. For a), yes scheduler will consider job's resource requirements b)  If the node failed, scheduler will check whether the SOA job's resource requirement is below "min", if yes, it will cancel the job

    3. By default, when admin/job owner cancels the running job, the scheduler will send a "Ctrl+C" (Maybe ctrl+End, not quite sure) event to the process that's running on the node, and wait for the process to exit. If it didn't exit within 15 seconds (This value is configurable), the process will be killed. So you can gracefully exit by handling the "Ctrl+C" event.

    Qiufang Shi

    Thursday, March 2, 2017 9:41 AM