locked
Fragmented core assignment RRS feed

  • Question

  • Hi!

    I witnessed some unexpected behavior on our HPC cluster: When I submit tasks that require 4 cores each, on a cluster that has 8 cores/node, it's sometimes the case that a task gets assigned 2 cores on one node and 2 cores on another node (and all other combinations, up to one core on each of four nodes).

    Since I'm running just a single program which then spawns 4 openmp threads, I'm kind of wondering which node my tasks actually run on. If something like that happens, is the binary started on all nodes? If so I might be spawning many more threads than intended. Even if it's only started on one node, I may run into a situation where four programs run on the same node, and each spawns four threads, i.e., running 16 threads on 8 cores.

    Is there any way to prevent this from happening? I.e., something like an AllCoresOnOneNode option?


    Thanks a lot,
    Christoph
    Friday, January 16, 2009 12:08 AM

Answers

  • We've looked into an option like that, and are strongly considering adding it for a future version, but it's unfortunately not available in the current release.

    For now, try adding the flag /corespernode:4 to your job; this should ensure you only get scheduled on a node with at least enough cores for your OpenMP task.  That should generally help you avoid the problem.

    Thanks!
    Josh
    -Josh
    Monday, January 19, 2009 7:16 PM
    Moderator