Monday, October 20, 2008 2:01 AMI'm trying to figure out if the Windows HPC Job Scheduler addresses my requirements.
1. Users will submit jobs that, in general, should execute in FCFS order, though
there may be some prioritization. It's ok for several jobs to
execute concurrently, but it's also important that not too many of them do,
because each job imposes a load on the database server, and in order to provide
good response time to real-time tasks we we must reserve some dbms capacity.
How can I configure jobs and the scheduler to allow a limited amount of concurrency?
2. In a cluster, must the compute nodes be HPC Server 2008 machines? Or only the head node?
Monday, October 20, 2008 10:22 AM
The reason for a cluster is to execute a lot of jobs in parallel - if all your jobs need to talk to a database running on only one server, that's a bottleneck.
So, if your database server can serve only, let's say 20 client jobs at one time, it makes no sence at all to have more than 20 processors in your cluster - unless you switch to a clustered database.
The HPC scheduler only runns jobs in parallel if enough resources (cores/sockets/nodes) are available. Lets say, if your cluster has only two nodes, and your users submitt jobs that need one node, you won't have more than two jobs running at once.
Normally, you install the Head Node as Windows HPC Server 2008 (first, install a Windows Server 2008 Operating System and second, install the HPC Pack).
In a normal Scenario, the Head Node installs the Compute Nodes automatically - it uses Windows Deploymnet Services to install the OS, and than, it installs and configures the HPC Pack so this computer becomes a compute node.
You cannot use Vista or XP for Compute Nodes, if you mean that. But you can manually install compute nodes (install Server 2008 and HPC Pack) - i've never tried this.
What you mean with FCFS?
Monday, October 20, 2008 1:39 PMsorry: FCFS means "first come, first served"
It appears that I could use the "core" resource as a rough constraint on concurrency. Of the scheduling constraints available (core, socket, node, and I think memory) core is the finest-grained. But still, due to the I/O characteristics of the process, it will be only roughly correct. I wish that I could classify jobs (class A, B, C...) and then define rules such as "max 3 concurrent Class A jobs per core).
Thanks for your response,
Monday, October 20, 2008 2:34 PMAs far as I know, you can program Job submission/activation filters;
This are little exe's which are installed (copyed to some dir) on the head node. They parse the job file and return values like 0 for "ok to activate" or unequal to 0 for waiting ...
To be honest, for me that is still a bit a mystery.
You may look at:
Maybe it is possible for you to identify "Job Class A" by Project/Task Name, or by the name of the Executeable.
But notice: You cannot run 3 jobs on 1 core; On one core, there is at maximum 1 job.
Tuesday, October 21, 2008 6:09 PMModerator
Florian is headed down the right path with Submission and Activation filters. What you could do is write an Activation Filter that looks at the job which is about to be started and at the database load. It could return 0 if there is additional database capacity availalbe, and return 1 if that job needs to wait because there is not additional capacity available.
BTW . . . for more info on how Node/Socket/Core selection affects scheduling, check out my blog post here:
There's a lot of other helpful information in that blog for those staritng out with Windows HPC Server!
Tuesday, October 21, 2008 8:57 PMA job submission filter isn't really the solution I'd like to find. If I use that, I'll have to build my own queue for jobs that weren't yet accepted, and retry logic.
I'm not saying that there should be an easier answer, but that doesn't stop me from looking. <s>
Thursday, October 23, 2008 3:15 PM
Maybe your solution is an Activation filter?
(I agree that a submission filter is maybe not the right solution to your problem.)
- Marked As Answer by LeoTohill Thursday, November 06, 2008 7:24 PM
Monday, October 27, 2008 11:25 PMModeratorRetry logic is handled by the scheduler; it will run the activation filter again every few minutes until the job is approved.