none
avoid even distribution of jobs among nodes RRS feed

  • Question

  • Hi, we are running a HPC 2008 server on our cluster. A lot of people submit jobs, many of which are long running single core jobs. In our system, these jobs are distributed evenly among nodes by default.

    Now, if I want to run parallel jobs that require shared memory, I would like to get a full node, but some cores are always blocked by other peoples single core jobs, so that I usually either cannot request the full 8 cores of a node, or I have to wait forever until a full node is free.

    So, my question is: Is there a way to change the default scheduling so that jobs are, if possible, started on the same node, so that a maximum of nodes are completely free for parallel jobs?
    Monday, April 19, 2010 9:55 PM

Answers

  • Hi,

    you could set the "Node ordering" to "-cores" in the Default job template. But this would even affect MPI-based parallel jobs if they are being started using the default template.

    Another approach would be the job submission filter. You could iterate through a job's tasks and determine if it is a serial task by analysing the UnitTypes and the number of requested cores per task. If your filter detects a serial task it could set the orderby attribute of the job to "-cores".

    Regards,

    Michael

     

    • Proposed as answer by MWirtz Friday, April 30, 2010 12:37 PM
    • Marked as answer by Don PatteeModerator Thursday, May 6, 2010 11:40 PM
    Thursday, April 22, 2010 12:49 PM