I witnessed some unexpected behavior on our HPC cluster: When I submit tasks that require 4 cores each, on a cluster that has 8 cores/node, it's sometimes the case that a task gets assigned 2 cores on one node and 2 cores on another node (and all other combinations, up to one core on each of four nodes).
Since I'm running just a single program which then spawns 4 openmp threads, I'm kind of wondering which node my tasks actually run on. If something like that happens, is the binary started on all nodes? If so I might be spawning many more threads than intended. Even if it's only started on one node, I may run into a situation where four programs run on the same node, and each spawns four threads, i.e., running 16 threads on 8 cores.
Is there any way to prevent this from happening? I.e., something like an AllCoresOnOneNode option?
We've looked into an option like that, and are strongly considering adding it for a future version, but it's unfortunately not available in the current release.
For now, try adding the flag /corespernode:4 to your job; this should ensure you only get scheduled on a node with at least enough cores for your OpenMP task. That should generally help you avoid the problem.