locked
2 Jobs toggle between Running and Queued (Nonestop) RRS feed

  • Question

  • Hello Togehter,

    We are using Microsoft HPC 2012 R2 with enabled Resource Pools.
    All our Jobs uses Node PreparationTasks and Node Release Tasks.
    The Scheduling Mode is set to "Queued",  PreemptionMode to "Graceful". "Increase Resources Automatically", "Decrease Resources Automatically" and "Grow by Preemtion" are also checked.

    We have the Problem that at least 2 Jobs from different Pools, are regulary go into the running state, and the back to queued. So either Job 1 from Pool A is running, or Job 2 from Pool B is running.

    When Job 1 switch to the state Queued, Job 2 will switch to the State Running. This two Jobs will switch nontstop their state, and during their Running State, they are only processing the NodePreparationTask and NodeRelease Task.

    When the NodepPreparation Task and NodeRelease Task is finished, the Head Node takes away this Resource (Node) from this Job and assigning it to the other Job. Here again only the NodePreparation Task and the NodeRelease Task will be executed.

    After this it switch again, and so on. So this Node is only busy with the execution of NodePreparation and NodeRelease Tasks. The Number of this Tasks inside this Jobs grows to a very high number and the Progressbar goes up to 99% even, when nearly no real "Worker Task" will be executed.

    For me it seems that the scheduler can't decide correctly to which job he will assign this resource, and so he all the time reassign this resource, with the effect, that this resource will not be productive. If we detect this behaviour in the scheduler, we tried to stop it with changing the Priority of one of this Job, but then the toggle will continue with 2 other jobs.

    What can we do to use this Resource (Node) realy?
    Has somebody else also this behaviour on his Cluster?
    What to do to get rid of this behaviour?
    Is this a bug inside the scheduler?

    Any help would be welcome

    thank you very much and best regards,

    Bobby



    • Edited by Bobby013 Tuesday, February 17, 2015 10:17 PM wrong statement
    Tuesday, February 17, 2015 10:09 PM

All replies

  • We cannot repro the problem.

    One known issue is the resource pool doesn't work with node groups, so if you specified node groups, you'd turn off the resource pools.

    With the above issue solved, please send me (evanc@microsoft.com) scheduler logs if this still occurs.

    Thanks,
    Evan

    Thursday, February 26, 2015 9:11 AM
  • Hi Evan,

    Thank you very much for your answer, we use realy also node groups, because we have around 100 nodes which are set up with a additional physical pci-express card, which enables a specific kind of job to make a faster calculatione of something.

    So we need this kind of group to run specific jobs on dedicated nodes.

    On the other side, we are depending on this multiple resource Pools (we have around 10), to be able to support different projects parallel.

    So for us it is not possible to turn of either pools or groups.

    What shall i do now, Do you know if there is a BugFix planned from Microsoft and when, or should i contact the support official here ?

    Thank you very much for your help,

    best regards,

    Bobby

    Thursday, February 26, 2015 10:56 AM
  • Hi Bobby,

    For now, resource pools cannot work with node groups and it is by design, and documented explicitly. And the effort to support it is huge, nearly impossible considering the theoretical complexity.
    I recommend you to only use node groups, and using balanced mode and properly set job priorities as a alternative solution to the resource pool resource balancing.

    Thanks,
    Evan

    Tuesday, March 3, 2015 9:44 AM
  • Hi Evan, 

    Thank you very much for your answer, indeed, i searched now inside the docu for the information you gave here, and i found it inside MSDN. It was a mistake from my side, that we assumed that we can use pools and groups together.

    Let's assume, we move our 10 Pools into 10 different Groups. Currently in the Pool Mode, we have the benefit, that the resources from Pools which are not used can be used by other Pools. So we have a dynamical allocation of the resources, and resources are not idle, at least, when we have enough jobs in the active queue.

    If we change now the mode to groups, it can happen, that we have some idle resources, because there is no active job for this queue available.

    So in my understanding, we would have some idle resources, we could use for other jobs. To prevent this, the only thing would be readjust online multiple times a day the nodes inside a group in a manual way. Is this understanding correct, or i'm wrong?

    An other idea i had would be following: 

    This special Hardware we use here can be compared to a special Software which is also installed on some dedicated nodes. But not with Floating Licenses, as Machine dependend fix licenses. Would it possible to remove this special HW group, and replace it somewhow with a virtual sw licence and tell the job, only run the tasks on this machines, where the license is installed? Basicaly the feature could be compared, but in the documentation, i found only some examples how to deal with the Flex-Licensmanager in such a case. But in my situation we would have a machine dependend licence.

    Do you know, if this could be a way, to get it solved?

    Thank you very much for your support Evan,

    best regards

    Bobby

    Thursday, March 5, 2015 8:31 AM
  • Hi Bobby,

    Will the following design solve your problem?

    Create NodeGroup GPU which contains all your nodes with PCIe Cards, create NodeGroup Others (or whatever name you prefer) contains other nodes, submit jobs that depends on GPU only to the node group GPU, submit jobs that doesn't depend on GPU to a union of GPU and Others.

    In this way, the GPU job will only execute on GPU nodes, but other jobs can occupy the GPU nodes when it is available.

    To prevent other jobs from stealing too many GPU resources, set the GPU jobs to a higher priority. So that you can have GPU jobs take the resource efficiently.

    For floating license, I don't think there is a built in solution. You probably need solve it by some logic out side of HPC.

    Thanks,
    Evan

    Thursday, March 5, 2015 9:55 AM