locked
MPI process affinity layout RRS feed

  • Question

  • I am in the process of setting up a Windows HPC cluster (HPC pack 2012 R2 U3) and for the most part it is working fine.

    My nodes have two X5570 Xeons each with 4 cores (8 cores per node). I have one very significant user who requires hyperthreading to be enabled, so I have some nodes with 16 virtual cores. I am trying to control the placement of 8-core mpi jobs on those nodes that have hyperthreading enabled, i.e. have them spread out uniformly across the cores (0,2,4,6,8,10,12,14).

    I have tried:

    • -affinity -affinity_layout spread
    • -affinity -affinity_layout spread:P
    • -affinity -affinity_layout spread:N
    • -affinity -affinity_layout balanced
    • -affinity -affinity_layout balanced:P
    • -affinity -affinity_layout balanced:N

    but none of these make any difference at all. The processes are allocated more or less on cores 0-7 every time.

    Where I am going wrong?

    Wednesday, April 6, 2016 10:14 AM

All replies

  • Hi,

    Can you give us some more details about how you launch your applications?

    Since you mentioned HPC Pack, I presume you are using job submit? If that's the case, can you check your cluster configuration for AffinityType? You can do that by using 'cluscfg listparams' command.

    Thanks,

    tuba

    Wednesday, April 6, 2016 6:59 PM
  • Hi Tuba,

    Thanks for your reply. 

    I launch the application via the HPC Pack Job Manager with a command line like

     "C:\Program Files\Microsoft MPI\Bin\mpiexec.exe" -affinity ...

    cluscfg listparams returns:

     AffinityType                     : NonExclusiveJobs

    Thanks,

    Richard.

    Thursday, April 7, 2016 8:10 AM
  • Hi Richard,

    HPC Pack has a resource allocation and an affinity setting for the tasks it runs. And there is also mpiexec's affinity option. Mpiexec runs its affinity algorithm on the resources that are assigned to the task by the HPC Pack. You can get more detailed information on HPC Pack's affinity settings from https://technet.microsoft.com/en-us/library/cc947603.aspx, look for AffinityType.

    In your case, it looks like you are submitting a job with 8 processes, and HPC Pack is allocating cores 0-7 for this task, and mpiexec spreads its processes on cores 0-7. However, I understand that you want to allocate the whole node for this task. There are a couple of ways to achieve this:

    • Keep AffinityType as NonExclusiveJobs. While submitting the job, select Node for resource type (command line equivalent would be job submit /numnodes:1-1 ...)
    • Keep AffinityType as NonExclusiveJobs. While submitting the job, keep Core for resource type, set minimium to 16 (job submit /numcores:16-16 ...)
    • Keep AffinityType as NonExclusiveJobs. While submitting the job, check 'Use assigned resources exclusively for this job' (job submit /exclusive ...)
    • Change AffinityType to NoJobs (cluscfg setparams AffinityType=NoJobs)

    Of course the best option would depend on your scenario, however I'd suggest the last one as a generic purpose solution.

    With any of the above, and -affinity_layout spread, you should be able to use virtual cores 0-2-4-..14.

    With any of the above, and -affinity_layout spread:P, you should be able to use physical cores 0-1-2..7, i.e. rank 0 runs on virtual cores 0-1, rank 1 runs on virtual cores 8-9 etc.

    Please let us know if this helps,

    Thanks,

    tuba


    • Edited by Tuba MSFT Thursday, April 7, 2016 8:24 PM
    Thursday, April 7, 2016 8:24 PM
  • Hi Tuba,

    Thank you very much for your very detailed reply. Haven't managed to test this yet due to license server problems and I am about to go away for a few days. I will post an update in due course.

    Regards,

    Richard.

    Friday, April 8, 2016 2:14 PM
  • Hi Tuba,

    Got this to work by selecting Node resource type and specifying -cores 8 -affinity-layout spread with the mpiexec command (the nodes have 8 physical cores with hyperthreading enabled i.e. 16 virtual cores).

    So, with 2 nodes  (/numnodes 2:2) and mpiexec -cores 8 -affinity-layout spread I get the desired behaviour with a 16 core job running on two nodes, with 8 cores on each node evenly spread across the virtual cores.

    This seems to work irrespective of whether AffinityType=NoJobs and whether or not AffinityType=NonExclusiveJobs and irrespective of whether or not 'Use assigned resources exclusively for this node' is checked.

    Many thanks for your help.

    Richard

    Friday, April 22, 2016 11:40 AM