Answered by:
How to use only one core of a node?

Question
-
Hello, I am a new user of a Windows HPC cluster consisting of 3 double-dual nodes, i.e. each node has 4 cores. Up to now, I only worked with Linux clusters, so I have to get used to the job manager at first. At the moment I intend to compare the Linux und Windows cluster with the Intel MPI benchmark and want to test different constellations of nodes and cores. But I am wondering how it could be possible to start only ONE process per node, i.e. use only ONE core per node to be able to start a 3 process-job using all the 3 nodes and not only 3 cores of one node. In the mpiexec man page the -pernode option is specified but this option doesn't work on the cluster if written to the command line ´(after mpiexec) in the Task List. I also played around with different settings for maximum core when creating an new job template, but not with the desired success. I would be very happy if anybody gave me a hint how to solve this problem. Many thanks, parsusWednesday, September 10, 2008 10:49 AM
Answers
-
Hello, Parsus.
Yes, the job scheduler on HPC Server (or CCS) is a bit different than running MPI on a set of Linux nodes. The key difference is the primary role the job scheduler takes in assigning resources as opposed to mpiexec. However, there are cases where the job scheduler and mpiexec arguments can be used together to better control process placement of your MPI application.
The simplest means of running a single process on each of N nodes is with node-based scheduling (/numnodes:) like this:
job submit /numnodes:3 mpiexec imb.exe
which would run the MPI application named "imp.exe" on 3 nodes with one process (MPI rank) per node. The HPCS scheduler allows you to schedule by node, socket, or core.
You can also combine job scheduler and mpiexec arguments to more closely control the processess placement of your application. For example:
job submit /numnodes:3 mpiexec -cores 2 imb.exe
will run 2 MPI processes on each of 3 nodes for a total of 6 MPI ranks for the job.
Note that mpiexec's "-affinity" argument can be used to separate processes on a node to avoid contention (and the resultant memory swapping and poor performance). The -affinity option will cause mpiexec to place that processes in such a way as to avoid any 2 processes sharing the same: L1 cache, L2 cache, Lx cache, phyiscal package, NUMA node (this list in order of precedence).
IMPORTANT NOTE: You mentioned "the -pernode option..." by which I believe you intended the "/corespernode" argument. Please be aware /corespernode is a requirement and not a resource request. For example,
job submit /corespernode:4 /numnodes:2 app.exe
will run a single process on each of 2 nodes where each of those nodes must have a least 4 cores each.
Hope this helps.
Eric
Eric Lantz (Microsoft)- Proposed as answer by elantzMicrosoft employee Tuesday, September 16, 2008 7:28 PM
- Marked as answer by parsus Thursday, September 18, 2008 11:39 AM
Tuesday, September 16, 2008 7:28 PM -
We've gotten a few questions about this . . . so I've posted this Blog entry about how to do process placement with Windows Server 2008. Check it out let me know if helps (or is wrong in any way!):
https://windowshpc.net/Blogs/jobscheduler/Lists/Posts/Post.aspx?ID=9
Thanks!
Josh
-Josh- Proposed as answer by Josh BarnardModerator Wednesday, September 17, 2008 12:43 AM
- Marked as answer by parsus Thursday, September 18, 2008 11:39 AM
Wednesday, September 17, 2008 12:43 AMModerator -
Many thanks to both of you!!! Your detailed explanations helped a lot. Now, jobs can be easily started via cmd or powershell. Within the Job Manager the selection of e.g. 'Job resources': Nodes, Min:2, Max:2 replaces the explicit option \numnodes:2 while using '-cores' at the Command Line specifies the number of cores.
My only concern is the following: If I start a job
job submit /numnodes:2 mpiexec -cores 2 Job.exe
on two nodes of a cluster with 3 double-duals, e.g. 4 cores per node, and have a look to 'Heat Map/Cores in Use' I would expect to see 2 busy cores on 2 nodes. But the heat map indicates that 4 cores per node are working on 2 nodes (no other jobs are running!), although the output of the job clearly indicates that only a total of 4 processes (not 8) worked together. In contrast to that the 'Heat Map/CPU Usage' behaves as expected: only a maximum of 50 % is reached on both nodes.
Many greetings, Parsus
The "cores in use" refers to the cores that have been allocated by the scheduler, not necessarily those being used by your application. In your case, since you requested the whole nodes, all cores are assigned to your job.
-Josh- Marked as answer by Josh BarnardModerator Thursday, May 7, 2009 5:31 PM
Thursday, May 7, 2009 5:30 PMModerator
All replies
-
Hello, Parsus.
Yes, the job scheduler on HPC Server (or CCS) is a bit different than running MPI on a set of Linux nodes. The key difference is the primary role the job scheduler takes in assigning resources as opposed to mpiexec. However, there are cases where the job scheduler and mpiexec arguments can be used together to better control process placement of your MPI application.
The simplest means of running a single process on each of N nodes is with node-based scheduling (/numnodes:) like this:
job submit /numnodes:3 mpiexec imb.exe
which would run the MPI application named "imp.exe" on 3 nodes with one process (MPI rank) per node. The HPCS scheduler allows you to schedule by node, socket, or core.
You can also combine job scheduler and mpiexec arguments to more closely control the processess placement of your application. For example:
job submit /numnodes:3 mpiexec -cores 2 imb.exe
will run 2 MPI processes on each of 3 nodes for a total of 6 MPI ranks for the job.
Note that mpiexec's "-affinity" argument can be used to separate processes on a node to avoid contention (and the resultant memory swapping and poor performance). The -affinity option will cause mpiexec to place that processes in such a way as to avoid any 2 processes sharing the same: L1 cache, L2 cache, Lx cache, phyiscal package, NUMA node (this list in order of precedence).
IMPORTANT NOTE: You mentioned "the -pernode option..." by which I believe you intended the "/corespernode" argument. Please be aware /corespernode is a requirement and not a resource request. For example,
job submit /corespernode:4 /numnodes:2 app.exe
will run a single process on each of 2 nodes where each of those nodes must have a least 4 cores each.
Hope this helps.
Eric
Eric Lantz (Microsoft)- Proposed as answer by elantzMicrosoft employee Tuesday, September 16, 2008 7:28 PM
- Marked as answer by parsus Thursday, September 18, 2008 11:39 AM
Tuesday, September 16, 2008 7:28 PM -
We've gotten a few questions about this . . . so I've posted this Blog entry about how to do process placement with Windows Server 2008. Check it out let me know if helps (or is wrong in any way!):
https://windowshpc.net/Blogs/jobscheduler/Lists/Posts/Post.aspx?ID=9
Thanks!
Josh
-Josh- Proposed as answer by Josh BarnardModerator Wednesday, September 17, 2008 12:43 AM
- Marked as answer by parsus Thursday, September 18, 2008 11:39 AM
Wednesday, September 17, 2008 12:43 AMModerator -
Many thanks to both of you!!! Your detailed explanations helped a lot. Now, jobs can be easily started via cmd or powershell. Within the Job Manager the selection of e.g. 'Job resources': Nodes, Min:2, Max:2 replaces the explicit option \numnodes:2 while using '-cores' at the Command Line specifies the number of cores.
My only concern is the following: If I start a job
job submit /numnodes:2 mpiexec -cores 2 Job.exe
on two nodes of a cluster with 3 double-duals, e.g. 4 cores per node, and have a look to 'Heat Map/Cores in Use' I would expect to see 2 busy cores on 2 nodes. But the heat map indicates that 4 cores per node are working on 2 nodes (no other jobs are running!), although the output of the job clearly indicates that only a total of 4 processes (not 8) worked together. In contrast to that the 'Heat Map/CPU Usage' behaves as expected: only a maximum of 50 % is reached on both nodes.
Many greetings, ParsusThursday, September 18, 2008 6:52 AM -
please note that the -cores switch only applies to MSMPI and not to other MPI implementations.
As to you uqestion about the heat map.
The "Cores in Use" is the number of cores allocated for the job, and indeed the scheduler allocated all cores to your jobs and thus all 4 show up in the heat map.
The "CPU Usage" indicate what percentage of the CPU's are actually used, and as you expected it would be 50%.
thanks,
.Erez- Edited by Lio Monday, September 22, 2008 10:34 PM
Monday, September 22, 2008 10:34 PM -
Hi,
Is it possible to specify one core on a node in CCS 2003?
Thanks,
Kenji
KenjiWednesday, October 1, 2008 12:50 PM -
Hi,
Is it a possible to run 2 tasks/jobs at the same time as follows.
MPIapp1: 4 Cores of Node1 and 4 Cores of Node2
MPIapp2: 4 Cores of Node1 and 4 Cores of Node2
- each nodes have 2 quad-core processors
I couldn't find how to run those job/task at the same time (not sequentially).
I think "mpiexec -cores" does not work when the UnitType is not Node.
And, if the UnitType is Node, multiple task can not run on the same node..
Tansks,Friday, October 31, 2008 3:21 AM -
Many thanks to both of you!!! Your detailed explanations helped a lot. Now, jobs can be easily started via cmd or powershell. Within the Job Manager the selection of e.g. 'Job resources': Nodes, Min:2, Max:2 replaces the explicit option \numnodes:2 while using '-cores' at the Command Line specifies the number of cores.
My only concern is the following: If I start a job
job submit /numnodes:2 mpiexec -cores 2 Job.exe
on two nodes of a cluster with 3 double-duals, e.g. 4 cores per node, and have a look to 'Heat Map/Cores in Use' I would expect to see 2 busy cores on 2 nodes. But the heat map indicates that 4 cores per node are working on 2 nodes (no other jobs are running!), although the output of the job clearly indicates that only a total of 4 processes (not 8) worked together. In contrast to that the 'Heat Map/CPU Usage' behaves as expected: only a maximum of 50 % is reached on both nodes.
Many greetings, Parsus
The "cores in use" refers to the cores that have been allocated by the scheduler, not necessarily those being used by your application. In your case, since you requested the whole nodes, all cores are assigned to your job.
-Josh- Marked as answer by Josh BarnardModerator Thursday, May 7, 2009 5:31 PM
Thursday, May 7, 2009 5:30 PMModerator -
I don't think this is psosible today with separate jobs. If you really want this, the best way is to create 1 job with a single task, where that task is an mpiexec command that starts two separate applications.
Thanks,
Josh
-JoshThursday, May 7, 2009 5:32 PMModerator -
By the way, the link above seemed broken. Here is an updated link:
http://blogs.technet.com/windowshpc/archive/2008/09/16/mpi-process-placement-with-windows-hpc-server-2008.aspx
Thanks!
J
-JoshThursday, May 7, 2009 5:32 PMModerator -
Hi all,
If I submit job with job submit /numnodes:2 mpiexec -cores 1 job.exe, will it use one core from each node.
And I have another query how I can submit a 6 core job by selecting 2 cores from one node and another 4 cores from other node. If it is possible please give the command
Regards,
KalyanMonday, June 15, 2009 5:02 PM