locked
Specify cores per node in MPI application and run multiple task on a node RRS feed

Answers

  • Hello Hirakata-san

    there error that you get is because you specify -cores 1, this makes mpiexec think that there is only one core on each node and thus a total of 2 cores (assuming 2 nodes). this was my mistake in my example you should actually not specify -n * in the second section but rather the specific number of processes that are needed.

    you should specify 

        job submit /numcores:16 mpiexec -cores 1 -n 8 mpiapp arg1_for_mpiapp : -n 8 mpiapp arg2_for_mpiapp

    btw: this will put all the odd ranks on the first node and all the even ranks o the other.

    hope this helps
    .Erez
    • Edited by Lio Friday, November 14, 2008 1:57 AM
    • Proposed as answer by Lio Saturday, November 15, 2008 8:45 AM
    • Marked as answer by Josh BarnardModerator Tuesday, November 18, 2008 9:19 PM
    Friday, November 14, 2008 1:57 AM

All replies

  • Hi, 

    If the scheduler doesn't support this, how about running mpiexec (MS-MPI) not using scheduler?


    > I believe that using cluster as above sometimes makes best performance,
    It's based on my experience of Linux Cluster ( same hardware ).

    Thanks,

    ------------------
    hirakata
    Friday, October 31, 2008 4:48 AM
  • I assume that you are using Windows HPC 2008 (the story is a bit different for Windows Compute Cluster Server 2003)

    The natural way would be to run app1 and app2 on different machines/cores; that would be rather simple as:

       job submit /numcores:16 mpiexec -n 8 mpiapp1 : -n * mpiapp2

    assuming your nodes have 8 cores; you can drop the -n * or replace it with -n 8; with the same effect.
    (see mpiexec /help2 for details on the syntax above)

    if you really want to splice them as described above, you can should let mpiexec know that each node "has" 4 cores and you are "oversubscribing"

       job submit /numcores:16 mpiexec -cores 4 -n 8 mpiapp1 : -n * mpiapp2

    in this case mpiexec will put mpiapp1 on the two nodes, thinking that each has only 4 cores; than it would "oversubscribe" the cores again with mpiapp2

    hope this helps,
    .Erez
    • Edited by Lio Friday, October 31, 2008 11:23 PM
    • Proposed as answer by Lio Sunday, November 16, 2008 5:50 PM
    Friday, October 31, 2008 11:22 PM
  • Hello Lio-san,

    Thank you for your reply.

    Before testing at the large cluster (8 Cores x 32 nodes WHPCS2008 RTM), I confirmed on the small test cluster (2 Cores x 8 nodes, WHPCS2008 RTM)following command as you described above .

    job submit /numcores:16 mpiexec -cores 1 -n 8 mpiapp arg1_for_mpiapp : -n * mpiapp arg2_for_mpiapp

    But the results is ...
        Error: not enough cores left for 'mpiapp' in section 2. the command line already subscribes 8 processes on 8 cores.

    The same error occurs when I submited mpipingpong.exe as below.

    job submit /numcores:16 mpiexec -cores 1 -n 8 "C:\Program Files\Microsoft HPC Pack\Bin\mpipingpong.exe" : -n * "C:\Program Files\Microsoft HPC Pack\Bin\mpipingpong.exe" 

      Error: not enough cores left for 'C:\Program Files\Microsoft HPC Pack\Bin\mpipingpong.exe' in section 2. the command line already subscribes 8 processes on 8 cores.

    It seems that oversubscribing is not permitted.
    Is any other comman-line option necessary? Or, is there any other resolution for that?


    Thank you.

    ------------------
    hirakata
     
    Thursday, November 6, 2008 1:17 AM
  • Hello Hirakata-san

    there error that you get is because you specify -cores 1, this makes mpiexec think that there is only one core on each node and thus a total of 2 cores (assuming 2 nodes). this was my mistake in my example you should actually not specify -n * in the second section but rather the specific number of processes that are needed.

    you should specify 

        job submit /numcores:16 mpiexec -cores 1 -n 8 mpiapp arg1_for_mpiapp : -n 8 mpiapp arg2_for_mpiapp

    btw: this will put all the odd ranks on the first node and all the even ranks o the other.

    hope this helps
    .Erez
    • Edited by Lio Friday, November 14, 2008 1:57 AM
    • Proposed as answer by Lio Saturday, November 15, 2008 8:45 AM
    • Marked as answer by Josh BarnardModerator Tuesday, November 18, 2008 9:19 PM
    Friday, November 14, 2008 1:57 AM