locked
Affinity Mode in R2 RRS feed

Answers

  • Worked with Colin offline.  For our application the affinity on a process needs to be not set on any of our processes so this option below was able to configure the whole cluster to work this way:

    cluscfg setparams /scheduler:colinwmain affinitytype=NoJobs

     Other options are:

    Affinity is set for all processes in all jobs

    cluscfg setparams /scheduler:colinwmain affinitytype=alljobs

    Affinity is set for all jobs that do not exclusively take a whole machine (default)

    cluscfg setparams /scheduler:colinwmain affinitytype=NonExclusiveJobs

     

    Cheers,

    -Scott

    Tuesday, August 3, 2010 8:43 PM

All replies

  • The new feature is available when HPC V3 is running on a compute node running Windows Server 2008 R2.

    The normal Windows APIs for setting affinity on processes or threads can be used unchanged.

    Normally this isn't an issue except for tasks that are assigned multiple cores but not all the cores on a node. Typically this is just MPI programs where multiple ranks may be run on part of a compute node.


    -Colin, Microsoft HPC
    Monday, July 12, 2010 4:20 AM
    Moderator
  • Colin,

    On our Hpc 2008 SP1 system I have added to our C++ worker process the code to set the affinity mask but the call fails and the mask is still set to one core.  

    I've tried the same executable both on my dev box and on a cluster node with the same user account in a command shell but the SetProcessAffinityMask() call fails and the process is still locked to one core.

    1. What is Hpc doing that is locking out the setting of the process or affinity by our process?  (Don't think it is permissions since it works w/ the same user account if I run the command through a cmd.exe)

    2. What can be done to unlock this?  Can't see this as a permissions problem as it works when run by hand.

    Below is the code that I'm using to set the affinity mask to use all cores on the compute node.

    Thank you.

    DWORD procs = 0;
    DWORD system_mask = 0;
    DWORD proc_mask = 0;
    
    SYSTEM_INFO sysinfo;
    GetSystemInfo( &sysinfo );
    int numCPU = sysinfo.dwNumberOfProcessors;
    LOGINFO( "System has " + boost::lexical_cast<std::string>(numCPU) + " cores:" );
    
    HANDLE hProcess = OpenProcess( PROCESS_ALL_ACCESS, true, GetCurrentProcessId() );
    if( NULL != hProcess )
    {
    	GetProcessAffinityMask(hProcess,&proc_mask,&system_mask);
    	LOGINFO("Startup Processor Afinity->Process:" + boost::lexical_cast<std::string>(proc_mask) + " System:" + boost::lexical_cast<std::string>(system_mask) );
    	BOOL success = SetProcessAffinityMask(hProcess,system_mask);
    	proc_mask = 0;
    	system_mask = 0;
    			GetProcessAffinityMask(hProcess,&proc_mask,&system_mask);
    	LOGINFO("Reset  Processor Afinity->Process:" + boost::lexical_cast<std::string>(proc_mask) + " System:" + boost::lexical_cast<std::string>(system_mask) + " SetSuccess:" + boost::lexical_cast<std::string>(success));
    
    	CloseHandle( hProcess );
    	hProcess = NULL;
    }
    else
    {
    	LOGWARN( "Could not open process." );
    }
    

     

     

    Monday, July 12, 2010 5:03 PM
  • Scott,

    I think I can explain what is going on. When two tasks are sharing an 8 core node, and in this example one task has 4 cores and the other also has 4 cores, HPC assigns which cores the processes in each task can run on. It looks like your example is trying to change the cores the Scheduler assigned to the process to include the ones assigned to the other task which could adversely affect the other task.

    The feature I was describing allows a process running in a task that has 4 cores assigned to it to run on a subset of the cores. This can be useful when the task runs several CPU bound processes simultaneously. This doesn't seem to be what you are trying to do however.

    If you try job submit /numsockets:1 or job submit /numnodes:1 then the task will be assigned more than one core by the Scheduler. Is this closer to what you need?

    If you need to overcommit the cores then setting the AffinityMode to NoJobs as described in the post you referenced will cause the Scheduler and NodeManager to start your tasks with no affinity set and leave you with the responsibility of assigning affinity if it is necessary.


    -Colin, Microsoft HPC
    Wednesday, July 14, 2010 6:13 PM
    Moderator
  • Colin,

    We do want to over-commit the cores since our process runs one thread (CPU intensive) and has other threads for side operations that are mostly for pushing data back into the database.  It is a way for us to take advantage of other cores that may have a process on them but are blocked waiting for I/O themselves.

    From what I see the documentation AffinityMode is a new feature in R2 and not open in SP1.

    1. Is there a way in SP1 to either:

       A. not have Hpc set the affinity on the process

       B. reset the core affinity from C++ (must be running into OS security that I'm not aware of)

    As stated above the usual windows calls to set affinity mask fail from a process launched by Hpc but the same code from a user started process works.

    2. Do you have a quick code example for how AffinityMode can be set on the Job/Task?  I don't see any way to set it in the documentation or from looking at the R2 RC1 binaries.

    Thank you,

    -Scott

     

     

    Wednesday, July 14, 2010 8:38 PM
  • Hi Scott,

    Can you email me direct (colinw at microsoft.com)  so we can resolve this for you as quickly as possible?


    -Colin, Microsoft HPC
    Thursday, July 15, 2010 12:08 AM
    Moderator
  • Worked with Colin offline.  For our application the affinity on a process needs to be not set on any of our processes so this option below was able to configure the whole cluster to work this way:

    cluscfg setparams /scheduler:colinwmain affinitytype=NoJobs

     Other options are:

    Affinity is set for all processes in all jobs

    cluscfg setparams /scheduler:colinwmain affinitytype=alljobs

    Affinity is set for all jobs that do not exclusively take a whole machine (default)

    cluscfg setparams /scheduler:colinwmain affinitytype=NonExclusiveJobs

     

    Cheers,

    -Scott

    Tuesday, August 3, 2010 8:43 PM