locked
Clarification on Priorisation algorithm RRS feed

  • Question

  • HI All,

    Similar to another thread here. I am trying to understand the prioritisation and scheduling algorithm employed by HPC, as part of a architecture proof of concept we are looking to do here to decide whether to use HPC for our monte carlo risk application.

    We have a business requirement to allow high and low priority jobs to run on our HPC concurrently. These jobs will each contain thousands of tasks.  Our desire would be for the higher priority work to complete before lower priority work, and that the higher priority work will get the most resources from the cluster, irrespective or when it is submitted.

    We have a two node HPC cluster, with each node possessing 8 cores. 

    In order to  mimic this situation i have put together the folllowing application that is deployed to the cluster. It simply sleeps the active thread for a user supplied period of time. It is deliberately trivial.

    namespace DeployedApplication
    {
        class Program
        {
            static void Main(string[] args)
            {
                Thread.Sleep((int.Parse((args[0])) * 1000));
            }
        }
    }
    The attached client program simulatenously submits two jobs to this console app in two ways:

    i)   low priority long running work (passing a longer sleep time) 
    ii)  high priority short running work (passing a short sleep time).

    private void TestPrioritisation()
    {
                // Initial Test of Scheduler
                Scheduler scheduler = new Scheduler();
                scheduler.Connect("HEADNODE");
    
                ISchedulerJob job = scheduler.CreateJob();
               
                job.Name = "Lowest Priority Long Run Time.";
                job.Project = "Test Project";
                job.IsExclusive = false;
                job.RunUntilCanceled = false;
                job.Runtime = (int)TimeSpan.FromMinutes(30).TotalSeconds;
                job.Priority = JobPriority.Lowest;
                job.CanPreempt = true;
    
                for (int i = 0; i < 1000; i++)
                {
                    ISchedulerTask task = job.CreateTask();
                    task.WorkDirectory = "C:\\Program Files\\Richard";
                    task.CommandLine = "DeployedApplication.exe 10";
                    task.MinimumNumberOfCores = 1;
                    job.AddTask(task);
                }
    
                scheduler.SubmitJob(job, null, null);
    
    
    
                ISchedulerJob job2 = scheduler.CreateJob();
    
                job2.Name = "Highest Priority Short Run Time.";
                job2.Project = "Test Project";
                job2.IsExclusive = false;
                job2.RunUntilCanceled = false;
                job2.Runtime = (int)TimeSpan.FromMinutes(5).TotalSeconds;
                job2.Priority = JobPriority.Highest;
    
                for (int i = 0; i < 16; i++)
                {
                    ISchedulerTask task = job2.CreateTask();
                    task.WorkDirectory = "C:\\Program Files\\Richard";
                    task.CommandLine = "DeployedApplication.exe 1";
                    task.MinimumNumberOfCores = 1;
                    job2.AddTask(task);
                }
    
                scheduler.SubmitJob(job2, null, null);
    
            }



    The behaviour i was expecting was that as the low priority items were starting to finish, the higher priority items would start to be executed, and most importantly, start to gain an increasing share of cluster resources (Grow\Shrink etc are all enabled in the scheduler config, as is backfill from the queue).

    The behaviour that was actually observed was that the highest priority jobs tasks did begin to execute, but never received more than 1 core of the clusers resources on one node, whereas the low priority work received 8 cores on each node and continued to execute and run its tasks.

    Could this be explained please?

    Thanks

    Richard

    Tuesday, January 5, 2010 12:03 PM

Answers

  • Hi Richard,

    The best way to ensure that a higher priority job will get the majority of cores when the other running and queued jobs are of lower priority is to set the Minimum resources for the higher priority job appropriately. The graceful pre-emption algorithm does not keep shrinking other jobs once the higher priority job has enough resources to start. So, if your higher priority job requires a minimum of 1 core to start, then it will get at least 1 core. But it may get more than 1 core depending on what's available at the instant the job is scheduled. The approach you suggested might work, but cluster resource usage is dynamic and current usage is not necessarily a good guide to the future usage when you submit the higher priority job. Probably the best approach is to determine the minimum resources the higher priority job actually needs based on your knowledge of the tasks in the job and how long you wish it to take, rather than based on how busy the cluster currently is.

    Regards,

    Patrick
    Thursday, January 7, 2010 11:28 PM

All replies

  • Hi Richard,

    By default, the MaximumNumberOfCores is 1. see doc http://msdn.microsoft.com/en-us/library/microsoft.hpc.scheduler.ischedulerjob.maximumnumberofcores(VS.85).aspx

    For your code, you want to set both of the low and high priority job's resources like
      job.MinimumNumberOfCores = 1;
      job.MaximumNumberOfCores=16; // total number of cores on all the compute nodes of the cluster

    Can you try this?

    Liwei
    Tuesday, January 5, 2010 8:50 PM
  • Hi Liwei, will try this tomorrow in the office (we are in the uk). Should I explicitly set core level numbers on the individual tasks as well as the jobs? In theory my requirement is a generic one so it would be interesting to hear what scheduling behaviour you would expect to see from a two node hpc cluster?
    Tuesday, January 5, 2010 9:13 PM
  • Hi Richard,

    I've tried your code and I think the behavior you're seeing is OK and expected. It's true that the default value for MaximumNumberOfCores for a job is 1 as Liwei mentioned, however the default value for AutoCalculateMin for a job is True, which negates any setting for MaximumNumberOfCores. This means that the job is essentially NumCores:*-*, which is what I think you had assumed. In that case, this tells the scheduler that the HighestPriority job can run on as little as 1 core, and that's what it gets in this case.

    If you would like to see your HighestPriority job always take more than 1 core away from your LowestPriority job, then you can try setting the MinimumNumberOfCores for the HighestPriority job to be, say, 4 (plus set the job's AutoCalculateMin property to false) like this:

    job2.AutoCalculateMin = false;
    job2.MinimumNumberOfCores = 4;
    


    Regards,

    Patrick
    Tuesday, January 5, 2010 10:56 PM
  • Hi patrick, Thanks. I will try what you've suggested when I return to the office tomorrow in the uk. Extrapolating a little further, if our requirement is always for higher priority work to get the majority of cores when running alongside a lower priority run what would be the best way to achieve this? My initial thoughts are at the time of high priority job submission, we check whether there are any low priority jobs executing, and if so then use some kind of % algorithm against the number of cores in the cluster and then apply this to the minimum cores value. Does this sound like an sensible approach? Thanks Richard
    Tuesday, January 5, 2010 11:15 PM
  • Hi Richard,

    The best way to ensure that a higher priority job will get the majority of cores when the other running and queued jobs are of lower priority is to set the Minimum resources for the higher priority job appropriately. The graceful pre-emption algorithm does not keep shrinking other jobs once the higher priority job has enough resources to start. So, if your higher priority job requires a minimum of 1 core to start, then it will get at least 1 core. But it may get more than 1 core depending on what's available at the instant the job is scheduled. The approach you suggested might work, but cluster resource usage is dynamic and current usage is not necessarily a good guide to the future usage when you submit the higher priority job. Probably the best approach is to determine the minimum resources the higher priority job actually needs based on your knowledge of the tasks in the job and how long you wish it to take, rather than based on how busy the cluster currently is.

    Regards,

    Patrick
    Thursday, January 7, 2010 11:28 PM
  • Hi Patrick, Liwei,

    Thanks very much for all of your help.

    I think we are at a point now where we understand how we can combine the prioritisation and core allocation properties to control the execution order of work in our HPC cluster.

    My only comment would be similar to that expressed by another forum poster around the intuitiveness of this model, and the efficacy of the shrink and grow policies.

    I'll mark this thread as answered!

    Thanks very much

    Richard
    Monday, January 18, 2010 5:06 PM