locked
OpenMP job submission - "cluster fragmentation" RRS feed

  • Question

  • Hi everybody;

    I've the following problem:
    On our cluster, we are running some Single-Core Single-Task Jobs (the code is not parallelized), and these jobs usually stay there for days and somtimes weeks.
    On the other hand, we have some software which is OpenMP or MPI parallel.
    For the OpenMP-jobs, i want to allocate one node exclusivly, but these jobs are usually "queued" for a long time, because there is at least allmost allways one single-core job ore one process of an MPI job on all nodes, even if there would be space for that on other nodes;
    (At the moment, all my nodes run between 12.5% and 25% load, and my OpenMp job stays in queue.)
    I want the job scheduler to "fill up" nodes with Single-Core MPI-Processes, before it begins to start processes on empty nodes.
    Right by now, it seems that the scheduler tries to equally distribute all jobs over all nodes, what realy "fragments" our cluster.


    Thank you for your answers,
    and have a nice weekend


    Florian Kummer
    Fluid Dynamics Group
    Technical Univercity of Darmstadt
    (Germany)
    Friday, October 17, 2008 2:25 PM

Answers

  • Florian,
     The problem you described was fixed in the RTM version. The server version for that is 2.0.1551.
    I suggest that you update to the RTM version.

    After the upgrade you should use the /orderby:-cores option for all the single core jobs.
    The best way would be to create a job template for the single core jobs with this option set.
    The users submitting the single core jobs should use this jobtemplate to submit jobs.


    This should give you the desired behaviour.

    Thanks,
    Sayantan
    Monday, November 24, 2008 7:20 AM
    Moderator

All replies

  • Florian,
    This is a great question!

    The default scheduler behavior is to sort available nodes by the number of cores available and give a job the node with the most available cores.  You can override this setting per-job by using the order-by property on the job.  So in your case, you should submit your long-running, serial jobs with the order-by set to give the node with the fewest available cores first . . . this will load up those nodes with serial jobs and leave empty nodes empty for your parallel workload.

    To make this setting, add the flag "/orderby:-cores" to your job new command, or select "Prefer nodes with Less Cores" on the Resource Selection tab of the job submission UI.  If you do this when submitting your single-core jobs, I believe it will give you the desired behavior.



    Thanks!
    Josh
    -Josh
    Tuesday, October 21, 2008 6:05 PM
    Moderator
  • I just ran a test on this and it doesn't function as you described.

    I ran a job with 4 cores and it picked node hpc1 with 4 cores on it.
    I then ran a job with 2 cores with "prefer nodes with less cores" set in the resources selection tab.  It placed the job on HPCMaster. 

    A while back I tested the memory one as well with the same effect.  Basically it just reads hardware configurations from what I can tell not avaliable resources.  If it did read avaliable resources it would be the best thing since sliced bread....

    Wednesday, October 22, 2008 9:12 AM
  • I also have to say, that this solution doesn't seem to work. It seems that the scheduler/allocator only to looks for nodes with less "physically-present" cores, not for "unused" cores.

    My server Version is 2.0.1452.0;

    Right by now, I  solve the problem with two node groubs, one for scalar and MPI jobs, one for OpenMP jobs. Maybe I will implement some filters one day to enforce that rule.

    But maybe there is/there comes something more elegant...
    Wednesday, October 22, 2008 12:42 PM
  • Florian,
     The problem you described was fixed in the RTM version. The server version for that is 2.0.1551.
    I suggest that you update to the RTM version.

    After the upgrade you should use the /orderby:-cores option for all the single core jobs.
    The best way would be to create a job template for the single core jobs with this option set.
    The users submitting the single core jobs should use this jobtemplate to submit jobs.


    This should give you the desired behaviour.

    Thanks,
    Sayantan
    Monday, November 24, 2008 7:20 AM
    Moderator