none
How to fix the number of the processes on one node RRS feed

  • Question

  • Hi there,

     

    I have a cluster with 5 nodes each node has 24 cores totally 120 core.

    When I run my mpi code let say for 80 processes 8 processes run on the first node and the other three nodes get 24 processes for each.

    I would like to assign 20 processes on 4 nodes instead of 8 + 24 + 24 + 24

    I was wondering how should I lunch the mpiexec command?

    I have tried different mpiexec switches including the followings:

    1- mpiexec -np 80 -gmachinefile c:\host.txt test.exe

    and in the host.txt file I wrote

    NODE01 20

    NODE02 20

    NODE03 20

    NODE04 20

     

    and also

    2- mpiexec /host 4 NODE01 20 NODE02 20 NODE03 20 NODE04 20 test.exe

     

    But unfortunately the result is the same in both cases 8 processes assigned to the node01 and 24 processes to each of other nodes

     

    Any thought on this issue would be appreciated.

    Regards

     

    Ehsan

    Sunday, November 6, 2011 4:23 PM

Answers

  • Hi Ehsan,

    You can do that following the steps:

    1. Create a machine file, say c:\services\host.txt. In the host.txt, allocate the numbe of cores for each node

    2. Make sure to copy host.txt file to each node

    32. Submit the job using job cmd with the parameters:

       job submit /numnodes:[total number of nodes] /askednodes:[list of nodes in the host.txt] /workdir:c:\service  mpiexec -np [total cores]  -gmachinefile host.txt myapp.exe

    For example, you have host.txt as following and each node has a copy under c:\service:
    node01 20

    node02 20

    node03 20

    node04 20

    You can submit a job

    job submit /numnodes:4 /askednodes:node01,node02,node03,node04 /workdir:c:\servce mpiexec -np 80 -gmachinefile host.txt test.exe

    Give it a try and please let me know whether it works as you expected.

    Thanks,

    james

     

    • Marked as answer by ehsan_ro Thursday, November 10, 2011 11:10 PM
    Wednesday, November 9, 2011 4:47 AM
  • Hi Ehsan,

    It is by design. As you allocate the node to one job, job scheduler will reserve all the cores in that node for the job. But it will only execute the job with specified number of cores. You can check the CPU usage of each node to prove it. If you want to use the rest of cores for another job, you can specify /exclusive:false option to the job. For more information about the job command and mpiexec options you can refer to the following link:

    job submit: http://technet.microsoft.com/en-us/library/cc972834(WS.10).aspx

    mpiexec: http://technet.microsoft.com/en-us/library/cc947675(WS.10).aspx

    Thanks,

    James

    • Marked as answer by ehsan_ro Thursday, November 10, 2011 11:05 PM
    Thursday, November 10, 2011 10:07 PM

All replies

  • Hello,

    I was wondering which MPI stack you are using? Is it MSMPI? If so, which version? I tried it with both -machinefile and -gmachinefile options and looked working fine.

    Could you try just run hostname.exe with the following machine file:

    NODE01 2

    NODE02 2

    NODE03 2

    NODE04 2

    Also, regarding to your option 2, does it work? for MSMPI, you should use /hosts instead of /host option.

    Thanks,

    James

     

    Monday, November 7, 2011 11:13 PM
  • Dear James,

     

    Thanks for your response.

     

    Regarding to MPI stack it is mpi.net which actually uses msmpi. you may find more info here

    http://osl.iu.edu/research/mpi.net/

     

    It should be mentioned that I am using Windows HPC server 2008 R2 and I am submitting the job through HPC 2008 R2 Cluster Manager

     

    I have run hostname.exe it just returns the machine name. I am not sure if I understand you correctly I just run hostname.exe in cmd

    please let me know if I should have done something else.

     

    Regarding to the second option you are absolutely right. It is a typo it should be /hosts, sorry.

    Thanks

    Ehsan

     



    • Edited by ehsan_ro Tuesday, November 8, 2011 1:19 AM
    Monday, November 7, 2011 11:38 PM
  • Hi Ehsan,

    Sorry I didn't make it clear. I mean run the hostname.exe with mpiexec:

    mpiexec -np 8 -gmachinefile c:\host.txt hostname

    And see what's the output. It should just return 8 outputs, two for each allocated node.

    Thanks,

    James

    Monday, November 7, 2011 11:56 PM
  • James,

     

    Thanks for clarification and quick respons

    here is its output

    NODE01
    NODE01
    NODE03
    NODE03
    NODE02
    NODE02
    NODE04
    NODE04

    the problem is that when I check the allocated cores in view job window

    It allocates cores in the nodes as follows

    NODE01        1
    NODE02        5
    NODE03        1
    NODE04        1

    I want them to be as

    NODE01        2
    NODE02        2
    NODE03        2
    NODE04        2

     

    Thanks again,

    Ehsan


    • Edited by ehsan_ro Tuesday, November 8, 2011 12:11 AM
    Tuesday, November 8, 2011 12:07 AM
  • Hi Ehsan,

    From the output, it clearly showed that each node allocated 2 processes as expected. So it looked like mpiexec did the right thing. You mentioned viewing in "job window", what kind of tool is it? It seemed something is wrong with it.

    Thanks,

    James

    Tuesday, November 8, 2011 12:34 AM
  • Dear James,

    The followings are snapshots of my cluster manager, it is one of windows tools comes with Windows 2008 hpc

    I should mentioned that some of the machine name are different from what I have listed in my last threads but it does not effect results.

    Thanks,

    Ehsan

     



    • Edited by ehsan_ro Tuesday, November 8, 2011 1:18 AM
    Tuesday, November 8, 2011 1:16 AM
  • Hi Ehsan,

    How you run the job? If from the job console, could you post the job command here?

    Thanks,

    James

    Tuesday, November 8, 2011 5:53 PM
  • Thanks Ehsan!

    It looked like you are using "job" command to submit the job. If this is the case, the expected behavior is that the expected values for "Allocated cores" should be "24". Job scheduler will just allocate all the resource of one node to the job as it doesn't know the content of "machine file". This is by design.

    I was wondering what's the command you used to submit the job? Could you post here?

    Another thing to check, could you double check that c:\host.txt exists in each node and are the same?

     

    Thanks,

    James

    Tuesday, November 8, 2011 6:09 PM
  • Hi James,

     

    Thanks for your respons

    I have created and run the job through the job management  console in Cluster manager software which I have attached its snapshot.

    I am not sure if I understand you correctly when you say job command.

    I have exported the job as xml and here is it.

      <?xml version="1.0" encoding="utf-8" ?>
    - <Job Version="3.000" Id="18673" State="Finished" SubmitTime="11/8/2011 12:06:58 AM" CreateTime="11/8/2011 12:06:35 AM" StartTime="11/8/2011 12:06:58 AM" Name="hostname" IsExclusive="false" RunUntilCanceled="false" UnitType="Core" Owner="CLUSTER\Administrator" UserName="CLUSTER\Administrator" Project="" JobType="Batch" JobTemplate="Default" Priority="Normal" RequestedNodes="HEAD,NODE01,NODE02,NODE03" OrderBy="" RequeueCount="0" AutoRequeueCount="0" PendingReason="None" AutoCalculateMax="false" AutoCalculateMin="false" FailOnTaskFailure="false" Progress="100" ProgressMessage="" MinCores="8" MaxCores="8" NotifyOnStart="false" NotifyOnCompletion="false" xmlns="http://schemas.microsoft.com/HPCS2008R2/scheduler/">
      <Dependencies />
    - <Tasks>
      <Task Version="3.000" Id="18678" ParentJobId="18673" RequiredNodes="HEAD,NODE01,NODE02,NODE03" State="Finished" UnitType="Core" WorkDirectory="\Services\" NiceId="1" CommandLine="mpiexec -np 8 -gmachinefile c:\services\host.txt hostname.exe" RequeueCount="0" PendingReason="None" StartValue="0" EndValue="0" IncrementValue="1" GroupId="18674" SubmitTime="11/8/2011 12:06:58 AM" StartTime="11/8/2011 12:06:58 AM" CreateTime="11/8/2011 12:06:35 AM" Name="My Task" MinCores="8" MaxCores="8" AutoRequeueCount="0" Type="Basic" />
      </Tasks>
      </Job>
    Thanks,
    Ehsan
    Tuesday, November 8, 2011 6:13 PM
  • James,

     

    I have checked the machine file on all of the nodes and they are the same.

    If from job command you mean a command to submit a job through cmd. I did not use it I just manually created the job in the cluster management program.

    Thanks

     

    Ehsan

    Tuesday, November 8, 2011 6:24 PM
  • You may try this in the job console:

    1. Open a CMD window (run as admin)

    2. Make sure you have host.txt in each node under c:\services and with the content:

    Node01 2

    Node02 2

    Node03 2

    Node04 2

    3. Submit the job with the command:

    job submit /numnodes:4 /askednodes:node01,node02,node03,node04 mpiexec -np 8 -gmachinefile c:\services\host.xt

    4. Get the job ID, and run:

    job view [jobid] /detailed

    Copy and paste the last part which contained "AllocatedCores"

     

    Thanks,

    James

    Tuesday, November 8, 2011 7:01 PM
  • Thanks for the instruction

    I have run it and it gives me

     

    AllocatedCores : HEAD 24 NODE01 24 NODE02 24 NODE03 24

    As you mentioned correctly in last thread it does not see the machinefile.

     

    When I do the same job using the Cluster Manager it sees the gmachinefile

    I am totally confused

     

    Thanks,

    Ehsan

    Tuesday, November 8, 2011 7:23 PM
  • It seemed woking fine with the job command. Actually, tt should be consistent no matter using job manager UI or console.

    Could you get the job xml file for the job you submitted via console? To do so, right click the job and select "Export Job...". Save it to some place and open it to compare with the job xml file which you got the wrong ouput to see whether there is difference.

    Thanks,

    James

    Tuesday, November 8, 2011 9:43 PM
  • James,

     

    here is the the xml file for the job submitted through console application

     

      <?xml version="1.0" encoding="utf-8" ?>
    - <Job Version="3.000" Id="18676" State="Finished" SubmitTime="11/8/2011 7:15:39 PM" CreateTime="11/8/2011 7:15:32 PM" StartTime="11/8/2011 7:15:39 PM" IsExclusive="false" RunUntilCanceled="false" UnitType="Node" Owner="CLUSTER\Administrator" UserName="CLUSTER\Administrator" Project="" JobType="Batch" JobTemplate="Default" Priority="Normal" RequestedNodes="HEAD,NODE01,NODE02,NODE03" OrderBy="" RequeueCount="0" AutoRequeueCount="0" PendingReason="None" AutoCalculateMax="false" AutoCalculateMin="false" FailOnTaskFailure="false" Progress="100" ProgressMessage="" MinNodes="4" MaxNodes="4" NotifyOnStart="false" NotifyOnCompletion="false" xmlns="http://schemas.microsoft.com/HPCS2008R2/scheduler/">
      <Dependencies />
    - <Tasks>
      <Task Version="3.000" Id="18681" ParentJobId="18676" State="Finished" UnitType="Node" NiceId="1" CommandLine="mpiexec -np 8 -gmachinefile c:\services\host.txt hostname.exe" RequeueCount="0" PendingReason="None" StartValue="0" EndValue="0" IncrementValue="1" GroupId="18677" SubmitTime="11/8/2011 7:15:39 PM" StartTime="11/8/2011 7:15:39 PM" CreateTime="11/8/2011 7:15:32 PM" MinNodes="4" MaxNodes="4" AutoRequeueCount="0" Type="Basic" />
      </Tasks>
      </Job>

    and this is the one which has been created UI
      <?xml version="1.0" encoding="utf-8" ?>
    - <Job Version="3.000" Id="18673" State="Finished" SubmitTime="11/8/2011 12:06:58 AM" CreateTime="11/8/2011 12:06:35 AM" StartTime="11/8/2011 12:06:58 AM" Name="hostname" IsExclusive="false" RunUntilCanceled="false" UnitType="Core" Owner="CLUSTER\Administrator" UserName="CLUSTER\Administrator" Project="" JobType="Batch" JobTemplate="Default" Priority="Normal" RequestedNodes="HEAD,NODE01,NODE02,NODE03" OrderBy="" RequeueCount="0" AutoRequeueCount="0" PendingReason="None" AutoCalculateMax="false" AutoCalculateMin="false" FailOnTaskFailure="false" Progress="100" ProgressMessage="" MinCores="8" MaxCores="8" NotifyOnStart="false" NotifyOnCompletion="false" xmlns="http://schemas.microsoft.com/HPCS2008R2/scheduler/">
      <Dependencies />
    - <Tasks>
      <Task Version="3.000" Id="18678" ParentJobId="18673" RequiredNodes="HEAD,NODE01,NODE02,NODE03" State="Finished" UnitType="Core" WorkDirectory="\Services\" NiceId="1" CommandLine="mpiexec -np 8 -gmachinefile c:\services\host.txt hostname.exe" RequeueCount="0" PendingReason="None" StartValue="0" EndValue="0" IncrementValue="1" GroupId="18674" SubmitTime="11/8/2011 12:06:58 AM" StartTime="11/8/2011 12:06:58 AM" CreateTime="11/8/2011 12:06:35 AM" Name="My Task" MinCores="8" MaxCores="8" AutoRequeueCount="0" Type="Basic" />
      </Tasks>
      </Job>
    It looks there are some difference between Min and Max Core and Nodes
    In the first one it is running on Nodes while in the second one it is running on Cores
    What is your opinions?
    Thanks
    Ehsan

    • Edited by ehsan_ro Tuesday, November 8, 2011 10:57 PM
    Tuesday, November 8, 2011 10:53 PM
  • Hi Ehsan,

    The main difference between two jobs, one is scheduler by UnitType="Core" and another is "UnitType="Node". That's the root cause for the different output. If scheduled by "Node", job scheduler will gurantee the whole node is reserved for the job which for "core", it will gurantee at least "one core" is reserved for the job. But there is no gurantee that the rest cores are allocated evenly. So you can see that the first node got 5 while the rest three each with one.

    The solution to your question is that schedule the job by node instead of core.

    Thanks,

    James

    Wednesday, November 9, 2011 12:46 AM
  • Thanks James for your time and kindness

     

    My main question was how to get the exact number of cores as required in the gmachinefile

    For instance if I asked for 4 cores on Node01 it allocates 4 cores on that Node.

     

    Regards

    Ehsan

    Wednesday, November 9, 2011 1:18 AM
  • Hi Ehsan,

    You can do that following the steps:

    1. Create a machine file, say c:\services\host.txt. In the host.txt, allocate the numbe of cores for each node

    2. Make sure to copy host.txt file to each node

    32. Submit the job using job cmd with the parameters:

       job submit /numnodes:[total number of nodes] /askednodes:[list of nodes in the host.txt] /workdir:c:\service  mpiexec -np [total cores]  -gmachinefile host.txt myapp.exe

    For example, you have host.txt as following and each node has a copy under c:\service:
    node01 20

    node02 20

    node03 20

    node04 20

    You can submit a job

    job submit /numnodes:4 /askednodes:node01,node02,node03,node04 /workdir:c:\servce mpiexec -np 80 -gmachinefile host.txt test.exe

    Give it a try and please let me know whether it works as you expected.

    Thanks,

    james

     

    • Marked as answer by ehsan_ro Thursday, November 10, 2011 11:10 PM
    Wednesday, November 9, 2011 4:47 AM
  • Hi James,

     

    I am sorry for delaying response I was out of town.

    I have tried your instructions unfortunately no luck. It just allocates all the available cores in each node to the job

    AllocatedCores : HEAD 24 NODE01 24 NODE02 24 NODE03 24

     

    Thanks,

    Ehsan

    Thursday, November 10, 2011 8:09 PM
  • Hi Ehsan,

    It is by design. As you allocate the node to one job, job scheduler will reserve all the cores in that node for the job. But it will only execute the job with specified number of cores. You can check the CPU usage of each node to prove it. If you want to use the rest of cores for another job, you can specify /exclusive:false option to the job. For more information about the job command and mpiexec options you can refer to the following link:

    job submit: http://technet.microsoft.com/en-us/library/cc972834(WS.10).aspx

    mpiexec: http://technet.microsoft.com/en-us/library/cc947675(WS.10).aspx

    Thanks,

    James

    • Marked as answer by ehsan_ro Thursday, November 10, 2011 11:05 PM
    Thursday, November 10, 2011 10:07 PM
  • Hi James,

     

    You are right it solves the problem

    Thanks for all the time you spend to help me and I greatly appreciated your kindness.

     

    Best Regards,

    Ehsan

    Thursday, November 10, 2011 11:05 PM