none
Semantics of tasks running on multiple nodes RRS feed

  • Question

  • Currently with HPC 2012R2, when the selected resource type is Core, a single task might be split among multiple nodes (e.g. if you ask 4 cores - you might end up with 3 cores on one node + 1 core on another one).

    What are the semantics of the exit codes? Does the process actually get started on both nodes? If so, it looks like currently the task is considered as completed if any of the running tasks returns an error code 0. Is that correct?

    Erik.



    Monday, June 23, 2014 5:23 PM

All replies

  • I am not very clear with your question, because a single task can only run on a single core, it will not split to multiple nodes.

    What kind of task do you submit? Basic task? parametric? or MPI?

    Basically, a single task run on a single core, it will be considered as finished when the return code is 0.

    If your job asks 4 cores as minimum resource, but your job has only one task, the job will occoupy 4 cores, but the task will be dispatched to one core to run, the other 3 cores are idle, you can use "task view xxxx /detailed" to see which core the task used.

    Tuesday, June 24, 2014 5:29 AM
  • I am not very clear with your question, because a single task can only run on a single core, it will not split to multiple nodes.

    What kind of task do you submit? Basic task? parametric? or MPI?

    Basically, a single task run on a single core, it will be considered as finished when the return code is 0.

    If your job asks 4 cores as minimum resource, but your job has only one task, the job will occoupy 4 cores, but the task will be dispatched to one core to run, the other 3 cores are idle, you can use "task view xxxx /detailed" to see which core the task used.

    I am referring here to basic tasks. I have jobs with thousands of basic task, where each task requires 4 cores. I have set the job to use "Auto min" and "Auto max" for the number of cores. And sometimes the scheduler decides to run 1 single basic task (set with 4 cores min and 4 cores max) on node A with 1 core and on node B with 3 cores. In these cases, I have only seen traces of the process running on the node A with 1 core. So far it seems that the process for the basic task does not even start on the node B with 3 cores reserved.

    I haven't found any documentation of this behavior. I would like to understand the details in order to write workarounds...

    Thanks

    Tuesday, June 24, 2014 1:34 PM
  • A single task can only run on a single core if you use "cores" as resource type, if you specify a task to use 4 cores, it should be "under-subscribe", it means: if you specify a task to use 4-4 cores, HPC will allocate this task 4 cores, but only run the executive on 1 core.

    You can found the document and Over-subscribe or under-subscribe core or socket counts on cluster nodes

    http://technet.microsoft.com/en-us/library/hh184316(v=ws.10).aspx

    Wednesday, June 25, 2014 3:47 AM
  • If you are not running MPI jobs, your task (application) will only be executed once on one of the cores allocated to your job though you asked 4. So a task won't split among nodes.

    Qiufang


    Qiufang Shi

    Tuesday, August 26, 2014 1:58 PM