none
how does machine allocation work when there are more tasks than machines? RRS feed

  • General discussion

  • Here is my scenario.

    I am running 3 jobs, each with 20 tasks and I have 50 machines total.  Each task takes somewhere between 1/2 day and 2 days to run.   The first 2 jobs start up, and each task within the job is assigned a machine and they start running.   The 3rd job starts up, but only 10 tasks actually start up (since there are only 10 machines remaining).   The thing I've noticed is that as tasks finish from jobs 1 & 2, additional jobs do not always start-up immediately from job 3.   If I restart job 3, it will see the available machines and start running on all the ones available at the time, but I would like to be able to have the tasks take the available machines as they finish tasks on other machines.  My suspicion is that the machines to use are allocated when the job is kicked off, so it needs to wait for those specific machines to be available to start running more tasks vs. any that become available.

    Thanks!


    Monday, August 1, 2016 9:41 PM

All replies

  • The behavior will be impacted by your scheduler policy setting. Please check whether your scheduling mode is in "Queued" mode, and with below setting enabled:


    Qiufang Shi

    Wednesday, August 3, 2016 8:41 AM
  • I've confirmed we already had these options set.  

    I saw a similar issue earlier today where we had 1 job with 20 tasks running, and then when we kicked off a 2nd job with 400 jobs (max nodes set to 25), it did not take 25 machines (should have had 45 available).  I wasn't able to login to the HPC cluster manager at that time, but I just checked a few minutes ago and it shows that there are 25 machines allocated (and on my program end, I see 25 jobs running simultaneously).     I'm going to try and get access to the cluster manager while the specific case is ongoing so I can see what is allocated at the time since checking after the fact didnt turn up very useful information unfortunately.

    Wednesday, August 3, 2016 7:24 PM
  • Okay, when the issues happens again, please help check these information:

    1. Output for "node listcores", this will tell you where the cores are allocated.

    2. Share us the scheduler logs with the job information (export to job xml file)


    Qiufang Shi

    Thursday, August 4, 2016 1:52 AM
  • Hi Qiufang,

    I was able to take a closer look and found the issue to be in our code.   The tasks which I thought were finished had actually not completed so it appeared as there were less than the maximum number of nodes allowed running.

    Thanks for the help!

    -Jason

    Thursday, August 4, 2016 3:19 PM
  • I have a similar question - I have set the scheduler to queued mode as described in Quifang's post above and I have a job with 60 tasks. There are 32 cores available across 5 workstations in my cluster; when the job runs, it takes no more than 21 to 24 cores.. How can I force the scheduler to take *all* the cores available in the cluster?

    The issue is that some of my 'longer' tasks are towards the end and the total elapsed time for my job is longer as I expect it to be. When I looked into the job details, it's using 24 sub tasks (which I am assuming match to cores). So 24 tasks can run in 'parallel' at a given point in time. I'd like all the cores to be utilized. 

    Thanks

    Friday, August 5, 2016 3:28 PM
  • Hi,

      Could you be able to share the job XML to us? (Through hpcpack@microsoft.com). We can take a check why not all cores are allocated to your job.


    Qiufang Shi

    Sunday, August 7, 2016 11:58 PM
  • Have sent the job xml file to the email id as requested above. Please advise.

    Monday, August 8, 2016 1:24 PM
  • Hi SRIRAM & Jason,

    If you are running SOA jobs, please check the serviceRequestPrefetchCount value in the service registration file and see if setting the value to 0 can resolve the problem. Meanwhile for SOA jobs, if the number of requests (calls) is lower than the number of service hosts (tasks), the SOA job can automatically shrink the service hosts (tasks) of the job.

    Note that in HPC Pack SOA, the job/task has different meaning from requests/calls. A job is a container for tasks, and a task is a service host process running on the compute nodes. The requests/calls are the actual work sent by the SOA client via broker node to the service hosts. Once the service request prefetch is enabled by setting the count to a value great than 0, more than one requests would be sent to one task (service host) in the SOA job, thus if the request number is low, some of the idle tasks (service hosts) in the job would be automatically shrinked because no requests/calls are dispatched to them.

    Regards,

    Yutong Sun


    Tuesday, August 9, 2016 8:45 AM
    Moderator
  • I'm not sure I understand it correctly. I have submitted '60' times from a durablesession which translates to 60 'Requests' in cluster manager -> job management -> SOA jobs for that job and I have '32' cores across 4 workstation nodes, I expect to see 32 simultaneous tasks handling 32 of these 60 requests; but the Activity logs shows allocation reduced on one or more of those nodes...

    [No value set for prefetch under loadbalancing should imply a default value of 1]

    Secondly, my issue with prefetch as I understand it is that a request is already being queued on a task and it will have to wait even if there are other tasks/cores that can process it... 

    in any event, if one task maps to 'servicehost.exe' on the workstation node, and I have 32 cores, I'd expect to see 32 'tasks' with each of them handling one request with default settings, no? All I see is # of tasks vary between 20-24 for my job with 60 'requests' leading to longer elapsed time. 

    did you get a chance to look at my job.xml ?



    • Edited by SRIRAM R Tuesday, August 9, 2016 1:05 PM
    Tuesday, August 9, 2016 12:51 PM
  • Hi SRIRAM,

    By default the value of serviceRequestPrefetchCount is 1, so we need to explicitly set it to 0 to disable the request prefetch feature. I've updated my last post for the number should be set to 0 instead of 1. Sorry for the mistake.

    So in your case, the 60 requests would be ideally dispatched to 30 service hosts/tasks with the default request prefetch count 1. If there is only 20-24 service hosts/tasks running, the reasons could be 1) the request processing time is too short, so more than 2 requests were sent to some service hosts; 2) the job scheduler were not able to start tasks/service hosts on a certain workstation node (due to availability policy or connectivity issue) or the broker failed to communicate with some of the service hosts to dispatch requests.

    To troubleshoot, I would recommend to firstly try to submit a normal job with min core equals to all 32 cores and see if the job can succeed to confirm all the workstation nodes are available. Then turn on the SOA message level tracing to see if any requests were failed to be dispatched to service hosts on a certain node and then routed by the broker to succeed on other health nodes.

    I checked your job.xml and it looks fine to me.

    Regards,

    Yutong Sun

    Thursday, August 11, 2016 4:35 AM
    Moderator
  • Thanks for the reply. Setting serviceRequestPrefetchCount  = 0 in loadbalancing section under broker node in SOA config file seems to work. But it still take a good 4- 5 minutes before all the cores are used. Until then, it shows 'Job Scheduled' on a few cores. (output of node listcores)

    All of this with just 1 job on the cluster with 60 'requests'.

    Thursday, August 11, 2016 7:37 PM
  • Could you turn on and check the message level trace which will tell you the details of each requests including when and where the requests were dispatched by the broker?

    For diagnostic purpose, you may also use the built-in EchoClient.exe, e.g. 'EchoClient.exe -n 100 -timeMS 600000' would create a session with 100 requests each lasts 10 minutes.

    Regards,

    Yutong Sun

    Friday, August 12, 2016 1:47 AM
    Moderator