Answered by:
How to fix the number of the processes on one node

Question
-
Hi there,
I have a cluster with 5 nodes each node has 24 cores totally 120 core.
When I run my mpi code let say for 80 processes 8 processes run on the first node and the other three nodes get 24 processes for each.
I would like to assign 20 processes on 4 nodes instead of 8 + 24 + 24 + 24
I was wondering how should I lunch the mpiexec command?
I have tried different mpiexec switches including the followings:
1- mpiexec -np 80 -gmachinefile c:\host.txt test.exe
and in the host.txt file I wrote
NODE01 20
NODE02 20
NODE03 20
NODE04 20
and also
2- mpiexec /host 4 NODE01 20 NODE02 20 NODE03 20 NODE04 20 test.exe
But unfortunately the result is the same in both cases 8 processes assigned to the node01 and 24 processes to each of other nodes
Any thought on this issue would be appreciated.
Regards
Ehsan
Sunday, November 6, 2011 4:23 PM
Answers
-
Hi Ehsan,
You can do that following the steps:
1. Create a machine file, say c:\services\host.txt. In the host.txt, allocate the numbe of cores for each node
2. Make sure to copy host.txt file to each node
32. Submit the job using job cmd with the parameters:
job submit /numnodes:[total number of nodes] /askednodes:[list of nodes in the host.txt] /workdir:c:\service mpiexec -np [total cores] -gmachinefile host.txt myapp.exe
For example, you have host.txt as following and each node has a copy under c:\service:
node01 20node02 20
node03 20
node04 20
You can submit a job
job submit /numnodes:4 /askednodes:node01,node02,node03,node04 /workdir:c:\servce mpiexec -np 80 -gmachinefile host.txt test.exe
Give it a try and please let me know whether it works as you expected.
Thanks,
james
- Marked as answer by ehsan_ro Thursday, November 10, 2011 11:10 PM
Wednesday, November 9, 2011 4:47 AM -
Hi Ehsan,
It is by design. As you allocate the node to one job, job scheduler will reserve all the cores in that node for the job. But it will only execute the job with specified number of cores. You can check the CPU usage of each node to prove it. If you want to use the rest of cores for another job, you can specify /exclusive:false option to the job. For more information about the job command and mpiexec options you can refer to the following link:
job submit: http://technet.microsoft.com/en-us/library/cc972834(WS.10).aspx
mpiexec: http://technet.microsoft.com/en-us/library/cc947675(WS.10).aspx
Thanks,
James
- Marked as answer by ehsan_ro Thursday, November 10, 2011 11:05 PM
Thursday, November 10, 2011 10:07 PM
All replies
-
Hello,
I was wondering which MPI stack you are using? Is it MSMPI? If so, which version? I tried it with both -machinefile and -gmachinefile options and looked working fine.
Could you try just run hostname.exe with the following machine file:
NODE01 2
NODE02 2
NODE03 2
NODE04 2
Also, regarding to your option 2, does it work? for MSMPI, you should use /hosts instead of /host option.
Thanks,
James
Monday, November 7, 2011 11:13 PM -
Dear James,
Thanks for your response.
Regarding to MPI stack it is mpi.net which actually uses msmpi. you may find more info here
http://osl.iu.edu/research/mpi.net/
It should be mentioned that I am using Windows HPC server 2008 R2 and I am submitting the job through HPC 2008 R2 Cluster Manager
I have run hostname.exe it just returns the machine name. I am not sure if I understand you correctly I just run hostname.exe in cmd
please let me know if I should have done something else.
Regarding to the second option you are absolutely right. It is a typo it should be /hosts, sorry.
Thanks
Ehsan
- Edited by ehsan_ro Tuesday, November 8, 2011 1:19 AM
Monday, November 7, 2011 11:38 PM -
Hi Ehsan,
Sorry I didn't make it clear. I mean run the hostname.exe with mpiexec:
mpiexec -np 8 -gmachinefile c:\host.txt hostname
And see what's the output. It should just return 8 outputs, two for each allocated node.
Thanks,
James
Monday, November 7, 2011 11:56 PM -
James,
Thanks for clarification and quick respons
here is its output
NODE01
NODE01
NODE03
NODE03
NODE02
NODE02
NODE04
NODE04the problem is that when I check the allocated cores in view job window
It allocates cores in the nodes as follows
NODE01 1
NODE02 5
NODE03 1
NODE04 1I want them to be as
NODE01 2
NODE02 2
NODE03 2
NODE04 2Thanks again,
Ehsan
- Edited by ehsan_ro Tuesday, November 8, 2011 12:11 AM
Tuesday, November 8, 2011 12:07 AM -
Hi Ehsan,
From the output, it clearly showed that each node allocated 2 processes as expected. So it looked like mpiexec did the right thing. You mentioned viewing in "job window", what kind of tool is it? It seemed something is wrong with it.
Thanks,
James
Tuesday, November 8, 2011 12:34 AM -
Dear James,
The followings are snapshots of my cluster manager, it is one of windows tools comes with Windows 2008 hpc
I should mentioned that some of the machine name are different from what I have listed in my last threads but it does not effect results.
Thanks,
Ehsan
- Edited by ehsan_ro Tuesday, November 8, 2011 1:18 AM
Tuesday, November 8, 2011 1:16 AM -
Hi Ehsan,
How you run the job? If from the job console, could you post the job command here?
Thanks,
James
Tuesday, November 8, 2011 5:53 PM -
Thanks Ehsan!
It looked like you are using "job" command to submit the job. If this is the case, the expected behavior is that the expected values for "Allocated cores" should be "24". Job scheduler will just allocate all the resource of one node to the job as it doesn't know the content of "machine file". This is by design.
I was wondering what's the command you used to submit the job? Could you post here?
Another thing to check, could you double check that c:\host.txt exists in each node and are the same?
Thanks,
James
Tuesday, November 8, 2011 6:09 PM -
Hi James,
Thanks for your respons
I have created and run the job through the job management console in Cluster manager software which I have attached its snapshot.
I am not sure if I understand you correctly when you say job command.
I have exported the job as xml and here is it.
<?xml version="1.0" encoding="utf-8" ?>- <Job Version="3.000" Id="18673" State="Finished" SubmitTime="11/8/2011 12:06:58 AM" CreateTime="11/8/2011 12:06:35 AM" StartTime="11/8/2011 12:06:58 AM" Name="hostname" IsExclusive="false" RunUntilCanceled="false" UnitType="Core" Owner="CLUSTER\Administrator" UserName="CLUSTER\Administrator" Project="" JobType="Batch" JobTemplate="Default" Priority="Normal" RequestedNodes="HEAD,NODE01,NODE02,NODE03" OrderBy="" RequeueCount="0" AutoRequeueCount="0" PendingReason="None" AutoCalculateMax="false" AutoCalculateMin="false" FailOnTaskFailure="false" Progress="100" ProgressMessage="" MinCores="8" MaxCores="8" NotifyOnStart="false" NotifyOnCompletion="false" xmlns="http://schemas.microsoft.com/HPCS2008R2/scheduler/"><Dependencies /><Task Version="3.000" Id="18678" ParentJobId="18673" RequiredNodes="HEAD,NODE01,NODE02,NODE03" State="Finished" UnitType="Core" WorkDirectory="\Services\" NiceId="1" CommandLine="mpiexec -np 8 -gmachinefile c:\services\host.txt hostname.exe" RequeueCount="0" PendingReason="None" StartValue="0" EndValue="0" IncrementValue="1" GroupId="18674" SubmitTime="11/8/2011 12:06:58 AM" StartTime="11/8/2011 12:06:58 AM" CreateTime="11/8/2011 12:06:35 AM" Name="My Task" MinCores="8" MaxCores="8" AutoRequeueCount="0" Type="Basic" /></Tasks></Job>Thanks,Ehsan
Tuesday, November 8, 2011 6:13 PM -
James,
I have checked the machine file on all of the nodes and they are the same.
If from job command you mean a command to submit a job through cmd. I did not use it I just manually created the job in the cluster management program.
Thanks
Ehsan
Tuesday, November 8, 2011 6:24 PM -
You may try this in the job console:
1. Open a CMD window (run as admin)
2. Make sure you have host.txt in each node under c:\services and with the content:
Node01 2
Node02 2
Node03 2
Node04 2
3. Submit the job with the command:
job submit /numnodes:4 /askednodes:node01,node02,node03,node04 mpiexec -np 8 -gmachinefile c:\services\host.xt
4. Get the job ID, and run:
job view [jobid] /detailed
Copy and paste the last part which contained "AllocatedCores"
Thanks,
James
Tuesday, November 8, 2011 7:01 PM -
Thanks for the instruction
I have run it and it gives me
AllocatedCores : HEAD 24 NODE01 24 NODE02 24 NODE03 24
As you mentioned correctly in last thread it does not see the machinefile.
When I do the same job using the Cluster Manager it sees the gmachinefile
I am totally confused
Thanks,
Ehsan
Tuesday, November 8, 2011 7:23 PM -
It seemed woking fine with the job command. Actually, tt should be consistent no matter using job manager UI or console.
Could you get the job xml file for the job you submitted via console? To do so, right click the job and select "Export Job...". Save it to some place and open it to compare with the job xml file which you got the wrong ouput to see whether there is difference.
Thanks,
James
Tuesday, November 8, 2011 9:43 PM -
James,
here is the the xml file for the job submitted through console application
<?xml version="1.0" encoding="utf-8" ?>- <Job Version="3.000" Id="18676" State="Finished" SubmitTime="11/8/2011 7:15:39 PM" CreateTime="11/8/2011 7:15:32 PM" StartTime="11/8/2011 7:15:39 PM" IsExclusive="false" RunUntilCanceled="false" UnitType="Node" Owner="CLUSTER\Administrator" UserName="CLUSTER\Administrator" Project="" JobType="Batch" JobTemplate="Default" Priority="Normal" RequestedNodes="HEAD,NODE01,NODE02,NODE03" OrderBy="" RequeueCount="0" AutoRequeueCount="0" PendingReason="None" AutoCalculateMax="false" AutoCalculateMin="false" FailOnTaskFailure="false" Progress="100" ProgressMessage="" MinNodes="4" MaxNodes="4" NotifyOnStart="false" NotifyOnCompletion="false" xmlns="http://schemas.microsoft.com/HPCS2008R2/scheduler/"><Dependencies /><Task Version="3.000" Id="18681" ParentJobId="18676" State="Finished" UnitType="Node" NiceId="1" CommandLine="mpiexec -np 8 -gmachinefile c:\services\host.txt hostname.exe" RequeueCount="0" PendingReason="None" StartValue="0" EndValue="0" IncrementValue="1" GroupId="18677" SubmitTime="11/8/2011 7:15:39 PM" StartTime="11/8/2011 7:15:39 PM" CreateTime="11/8/2011 7:15:32 PM" MinNodes="4" MaxNodes="4" AutoRequeueCount="0" Type="Basic" /></Tasks></Job>
and this is the one which has been created UI<?xml version="1.0" encoding="utf-8" ?>- <Job Version="3.000" Id="18673" State="Finished" SubmitTime="11/8/2011 12:06:58 AM" CreateTime="11/8/2011 12:06:35 AM" StartTime="11/8/2011 12:06:58 AM" Name="hostname" IsExclusive="false" RunUntilCanceled="false" UnitType="Core" Owner="CLUSTER\Administrator" UserName="CLUSTER\Administrator" Project="" JobType="Batch" JobTemplate="Default" Priority="Normal" RequestedNodes="HEAD,NODE01,NODE02,NODE03" OrderBy="" RequeueCount="0" AutoRequeueCount="0" PendingReason="None" AutoCalculateMax="false" AutoCalculateMin="false" FailOnTaskFailure="false" Progress="100" ProgressMessage="" MinCores="8" MaxCores="8" NotifyOnStart="false" NotifyOnCompletion="false" xmlns="http://schemas.microsoft.com/HPCS2008R2/scheduler/"><Dependencies /><Task Version="3.000" Id="18678" ParentJobId="18673" RequiredNodes="HEAD,NODE01,NODE02,NODE03" State="Finished" UnitType="Core" WorkDirectory="\Services\" NiceId="1" CommandLine="mpiexec -np 8 -gmachinefile c:\services\host.txt hostname.exe" RequeueCount="0" PendingReason="None" StartValue="0" EndValue="0" IncrementValue="1" GroupId="18674" SubmitTime="11/8/2011 12:06:58 AM" StartTime="11/8/2011 12:06:58 AM" CreateTime="11/8/2011 12:06:35 AM" Name="My Task" MinCores="8" MaxCores="8" AutoRequeueCount="0" Type="Basic" /></Tasks></Job>It looks there are some difference between Min and Max Core and NodesIn the first one it is running on Nodes while in the second one it is running on CoresWhat is your opinions?ThanksEhsan
- Edited by ehsan_ro Tuesday, November 8, 2011 10:57 PM
Tuesday, November 8, 2011 10:53 PM -
Hi Ehsan,
The main difference between two jobs, one is scheduler by UnitType="Core" and another is "UnitType="Node". That's the root cause for the different output. If scheduled by "Node", job scheduler will gurantee the whole node is reserved for the job which for "core", it will gurantee at least "one core" is reserved for the job. But there is no gurantee that the rest cores are allocated evenly. So you can see that the first node got 5 while the rest three each with one.
The solution to your question is that schedule the job by node instead of core.
Thanks,
James
Wednesday, November 9, 2011 12:46 AM -
Thanks James for your time and kindness
My main question was how to get the exact number of cores as required in the gmachinefile
For instance if I asked for 4 cores on Node01 it allocates 4 cores on that Node.
Regards
Ehsan
Wednesday, November 9, 2011 1:18 AM -
Hi Ehsan,
You can do that following the steps:
1. Create a machine file, say c:\services\host.txt. In the host.txt, allocate the numbe of cores for each node
2. Make sure to copy host.txt file to each node
32. Submit the job using job cmd with the parameters:
job submit /numnodes:[total number of nodes] /askednodes:[list of nodes in the host.txt] /workdir:c:\service mpiexec -np [total cores] -gmachinefile host.txt myapp.exe
For example, you have host.txt as following and each node has a copy under c:\service:
node01 20node02 20
node03 20
node04 20
You can submit a job
job submit /numnodes:4 /askednodes:node01,node02,node03,node04 /workdir:c:\servce mpiexec -np 80 -gmachinefile host.txt test.exe
Give it a try and please let me know whether it works as you expected.
Thanks,
james
- Marked as answer by ehsan_ro Thursday, November 10, 2011 11:10 PM
Wednesday, November 9, 2011 4:47 AM -
Hi James,
I am sorry for delaying response I was out of town.
I have tried your instructions unfortunately no luck. It just allocates all the available cores in each node to the job
AllocatedCores : HEAD 24 NODE01 24 NODE02 24 NODE03 24
Thanks,
Ehsan
Thursday, November 10, 2011 8:09 PM -
Hi Ehsan,
It is by design. As you allocate the node to one job, job scheduler will reserve all the cores in that node for the job. But it will only execute the job with specified number of cores. You can check the CPU usage of each node to prove it. If you want to use the rest of cores for another job, you can specify /exclusive:false option to the job. For more information about the job command and mpiexec options you can refer to the following link:
job submit: http://technet.microsoft.com/en-us/library/cc972834(WS.10).aspx
mpiexec: http://technet.microsoft.com/en-us/library/cc947675(WS.10).aspx
Thanks,
James
- Marked as answer by ehsan_ro Thursday, November 10, 2011 11:05 PM
Thursday, November 10, 2011 10:07 PM -
Hi James,
You are right it solves the problem
Thanks for all the time you spend to help me and I greatly appreciated your kindness.
Best Regards,
Ehsan
Thursday, November 10, 2011 11:05 PM