locked
Limited concurrency? RRS feed

  • Question

  • I'm trying to figure out if the Windows HPC  Job Scheduler addresses my requirements.

    1. Users will submit jobs that, in general, should execute in FCFS order, though
    there may be some prioritization.   It's ok for several jobs to
    execute concurrently, but it's also important that not too many of them do,
    because each job imposes a load on the database server, and in order to provide
    good response time to real-time tasks we we must reserve some dbms capacity.

    How can I configure jobs and the scheduler to allow a limited amount of concurrency?

    2.  In a cluster, must the compute nodes be HPC Server 2008 machines?  Or only the head node?




    Leo Tohill
    Monday, October 20, 2008 2:01 AM

Answers

  • Florian is headed down the right path with Submission and Activation filters.  What you could do is write an Activation Filter that looks at the job which is about to be started and at the database load.  It could return 0 if there is additional database capacity availalbe, and return 1 if that job needs to wait because there is not additional capacity available.

    BTW . . . for more info on how Node/Socket/Core selection affects scheduling, check out my blog post here:
    http://windowshpc.net/Blogs/jobscheduler/Lists/Posts/Post.aspx?ID=3

    There's a lot of other helpful information in that blog for those staritng out with Windows HPC Server!

    Thanks,
    Josh


    -Josh
    Tuesday, October 21, 2008 6:09 PM
    Moderator
  • Maybe your solution is an Activation filter?
    (I agree that a submission filter is maybe not the right solution to your problem.)

    • Marked as answer by LeoTohill Thursday, November 6, 2008 7:24 PM
    Thursday, October 23, 2008 3:15 PM

All replies


  • Question 1:
    The reason for a cluster is to execute a lot of jobs in parallel - if all your jobs need to talk to a database running on only one server, that's a bottleneck.
    So, if your database server can serve only, let's say 20 client jobs at one time, it makes no sence at all to have more than 20 processors in your cluster - unless you switch to a clustered database.
    The HPC scheduler only runns jobs in parallel if enough resources (cores/sockets/nodes) are available. Lets say, if your cluster has only two nodes, and your users submitt jobs that need one node, you won't have more than two jobs running at once.

    Question 2:
    Normally, you install the Head Node as Windows HPC Server 2008 (first, install a Windows Server 2008 Operating System and second, install the HPC Pack).
    In a normal Scenario, the Head Node installs the Compute Nodes automatically - it uses Windows Deploymnet Services to install the OS, and than, it installs and configures the HPC Pack so this computer becomes a compute node.
    You cannot use Vista or XP for Compute Nodes, if you mean that. But you can manually install compute nodes (install Server 2008 and HPC Pack) - i've never tried this.


    What you mean with FCFS?
    Monday, October 20, 2008 10:22 AM
  • sorry:  FCFS means "first come, first served"

    It appears that I could use the "core" resource as a rough constraint on concurrency.  Of the scheduling constraints available (core, socket, node, and I think memory) core is the finest-grained.  But still, due to the I/O characteristics of the process, it will be only roughly correct.  I wish that I could classify jobs (class A, B, C...) and then define rules such as "max 3 concurrent Class A jobs per core).


    Thanks for your response,

    - Leo








    Leo Tohill
    Monday, October 20, 2008 1:39 PM
  • As far as I know, you can program Job submission/activation filters;
    This are little exe's which are installed (copyed to some dir) on the head node. They parse the job file and return values like 0 for "ok to activate" or unequal to 0 for waiting ...


    To be honest, for me that is still a bit a mystery.
    You may look at:
    http://archives.windowshpc.net/files/4/sdk/entry898.aspx
    https://windowshpc.net/Resources/Pages/Files.aspx?Tag=Script&Tag=Example&Tag=Job%20Scheduler&TagCleared=Windows%20Server%202008

    Maybe it is possible for you to identify "Job Class A" by Project/Task Name, or by the name of the Executeable.

    But notice: You cannot run 3 jobs on 1 core; On one core, there is at maximum 1 job.
    Monday, October 20, 2008 2:34 PM
  • Florian is headed down the right path with Submission and Activation filters.  What you could do is write an Activation Filter that looks at the job which is about to be started and at the database load.  It could return 0 if there is additional database capacity availalbe, and return 1 if that job needs to wait because there is not additional capacity available.

    BTW . . . for more info on how Node/Socket/Core selection affects scheduling, check out my blog post here:
    http://windowshpc.net/Blogs/jobscheduler/Lists/Posts/Post.aspx?ID=3

    There's a lot of other helpful information in that blog for those staritng out with Windows HPC Server!

    Thanks,
    Josh


    -Josh
    Tuesday, October 21, 2008 6:09 PM
    Moderator
  •  A job submission filter isn't really the solution I'd like to find.  If I use that, I'll have to build my own queue for jobs that weren't yet accepted, and retry logic.

    I'm not saying that there should be an easier answer, but that doesn't stop me from looking. <s>
    Leo Tohill
    Tuesday, October 21, 2008 8:57 PM
  • Maybe your solution is an Activation filter?
    (I agree that a submission filter is maybe not the right solution to your problem.)

    • Marked as answer by LeoTohill Thursday, November 6, 2008 7:24 PM
    Thursday, October 23, 2008 3:15 PM
  • Retry logic is handled by the scheduler; it will run the activation filter again every few minutes until the job is approved.
    -Josh
    Monday, October 27, 2008 11:25 PM
    Moderator