none
Application using MPI supporting more than one Kgroup (cgroup) RRS feed

  • Question

  • Hello,

    I can't find anything about this anywhere and don't know where to ask. We've recently come across a problem where our application could not use the full processor (Intel E5 v4, 2 numa nodes of 20 cores each) and we found the explanation was given in an advisory from HP with a fix regarding to application using a newer kernel API with processor group support with a link to msdn library article on processor groups claiming support in newer Windows versions.

    Now, I am thinking that if the OS supports it (as per last link) and I can not find anything in the MS-MPI documentation, A) it should work (but it doesn't), or B) MS-MPI needs to be fixed to make this work.

    Anybody know how to solve this?

    Regards

    Thursday, January 12, 2017 9:24 AM

Answers

  • Hi Knut,

    Thanks for the update. We made a change to MS-MPI so that when you run in the console, mpiexec will launch smpd in a way that will make it possible to launch MS-MPI processes across multiple groups as well. This change will be available in the next release of MS-MPI

    • Marked as answer by Knut SG Tuesday, May 23, 2017 9:29 AM
    Thursday, March 23, 2017 9:39 PM

All replies

  • Within HPC Pack, the scheduler shall be able to discover these cores and assign tasks to these cores without problem (You need to try the latest version of HPC Pack 2012 R2, Update 3 or HPC Pack 2016).


    Qiufang Shi

    Friday, January 13, 2017 4:01 AM
  • Thanks, but that is not the problem. The OS knows 40 should be available so the scheduler launches 40 processes but they run at effectively 50%. Launching without scheduler starts only 20 processes.

    I got a reply from the MPI-team at MS so hopefully they can figure it out.

    Tuesday, January 17, 2017 8:52 AM
  • Hi

    I sent a reply to the email you sent to our external alias (askmpi at Microsoft dot com) but I'm posting it here in case some other folks might run into the same issue.

    Can you run the following experiment using the job scheduler:

    Job submit /…. /jobenv:MPIEXEC_AFFINITY_TABLE=3 mpiexec -affinity -n 40 application

     

    There will be some output that looks like the following table, can you provide us with the output?

    (in the below out put I have 4 processor groups)

     

    _______________________________________________________________________________

                        - MSMPI Rank Affinity Table 3 (By Processor Group) -

    Rank        Cores

    _______________________________________________________________________________

     

    VEDOVAHP1:0

      00000000 +...############################################################

      00000004 ..+.############################################################

      00000008 .+..############################################################

      00000012 ...+############################################################

     

    VEDOVAHP1:1

      00000002 +...############################################################

      00000006 ..+.############################################################

      00000010 .+..############################################################

      00000014 ...+############################################################

     

    VEDOVAHP1:2

      00000001 +...############################################################

      00000005 ..+.############################################################

      00000009 .+..############################################################

      00000013 ...+############################################################

     

    VEDOVAHP1:3

      00000003 +...############################################################

      00000007 ..+.############################################################

      00000011 .+..############################################################

      00000015 ...+############################################################

     

    Note that the MS-MPI processes are being launched by a process manager SMPD. If you submit the job through the HPC Pack Job scheduler, the HPC Pack MPI service (msmpi service) will launch smpd.exe. The HPC Pack MPI service launches smpd with the appropriate affinity setting so that it can spawn MS-MPI children across multiple groups. If you run mpiexec directly from a command console, mpiexec will launch smpd.exe. This scenario currently does not support multiple groups because mpiexec would inherit the single group setting from cmd console; smpd subsequently inherits it from mpiexec. We're looking at addressing this scenario (running SDK mode with mpiexec and not through HPC Pack) in the next release.

     

    Thanks

    Anh

    Friday, January 20, 2017 4:37 PM
  • An update: This came to me as a request and the original problem owner seems to be satisfied using a bios-setting that "patches" this. I do not have access to hardware to follow up on this myself so unless I hear back from them, I will not be able to examine this further.

    Anyway, a big thanks to Team ms-mpi for their help and I found this information from Anh very useful: "If you submit the job through the HPC Pack Job scheduler, the HPC Pack MPI service (msmpi service) will launch smpd.exe. The HPC Pack MPI service launches smpd with the appropriate affinity setting so that it can spawn MS-MPI children across multiple groups. If you run mpiexec directly from a command console, mpiexec will launch smpd.exe."

    Regards,

    Knut

    Monday, March 6, 2017 2:28 PM
  • Hi Knut,

    Thanks for the update. We made a change to MS-MPI so that when you run in the console, mpiexec will launch smpd in a way that will make it possible to launch MS-MPI processes across multiple groups as well. This change will be available in the next release of MS-MPI

    • Marked as answer by Knut SG Tuesday, May 23, 2017 9:29 AM
    Thursday, March 23, 2017 9:39 PM
  • Hi Anh,

    Cool. Thanks for the info.

    Tuesday, April 25, 2017 8:45 AM