locked
Slow dispatching or UI updates for large jobs? RRS feed

  • Question

  • I have a large job with over 3000 tasks, the ui and progress seem very accurate until about 70%. After that point, a filter by running state doesn't even show anymore the running tasks and when you look at the list of tasks, some are staying on the ui for minutes in "Dispatching" state. However, the global progress advances (and it seems to advance slower, I'm not even sure all the cores are being used), so I don't know if it a scheduling issue or just an UI update one.

     

    Monday, November 1, 2010 7:07 PM

Answers

  • When I have a single task name for all my 1500 tasks, the responsiveness of the scheduler is way better, so far I haven't seen the delay when starting new tasks. Thanks for helping on this!

    On another hand, 1500 names doesn't much data to handle, any chances you could open a bug on this one?

     

     


    Erik Putrycz - Solution Architect - Apption Software
    Monday, December 6, 2010 4:51 PM

All replies

  • Few more details: the tasks execute very fast (about 30 sec) and there seem to be a gap of 7 minutes in the start time between batches of tasks... This is odd!
    Erik Putrycz - Solution Architect - Apption Software
    Monday, November 1, 2010 7:11 PM
  • I've tried to repro this issue, so far without success.  Can you give me more details regarding the cluster so I can investigate further.  Specifically,

    If other jobs are also running, how many and what scheduling issues exist (priorities, balancing vs queue, etc.)
    I'm assuming the tasks require a single core each.  If not, please give greater detail.
    How many nodes exist in the cluster?  How many cores per node exist?
    View the logs of the tasks within the jobs.  Around 70% are they actually being delayed (and if so, is it delay between the tasks or is the task itself taking longer time than expected) or are they working through expected speed and the UI isn't displaying it appropriately?
    Any details regarding data read/write issues?

    Thanks for additional detail.
    Clark 

    Tuesday, November 2, 2010 10:38 PM
  • One possible issue with the previous bug is that we had the HPC database on our production DB server, I saw a timeout in the HPC logs so I installed sql server 2008 r2 express on the head node and move the db back to the head node.

    I just ran again a similar large job. The structure of the job is the following

    1500 tasks of type A

    1500 tasks of type B - each task depends on a single A task

    task C which depends on the 1500 B tasks

    task D which depends on task C

    After something like 63%, the dispatching suddently slows.

    I tried 2 ways of ordering the tasks:

    • task A-1 task B-1, task A-2 task B-2...
    • task A-1, task A-2, ... and then task B-1, task B-2,...

    this doesn't seem to make too much difference

    Also for some reason, HPC always runs all the A tasks first before starting any B task.


    Erik Putrycz - Solution Architect - Apption Software
    Thursday, November 18, 2010 6:53 PM
  • Thanks for the additional information.  While I've still been unable to reproduce the problem, I do have some suggestions.

    First, it makes sense that all A tasks were completed prior to any B tasks starting if dependencies were declared.  Documentation states:  "All tasks in a group must finish before any tasks in the next group can start."  If you want to have task B-1 to be launched immediately after task A-1 is completed, write both of them in a single script.  Not only will B start faster (since it doesn't have to be scheduled seperately), it will be run from the same core/thread as A, which could be advantageous in some circumstances.

    Second, what was the cause of the slowdown at ~63%?  Are all tasks being finished with the same duration or is the delay between the endtime of one task and the starttime of the following task?  If it is the task itself, you'll need to check out its log; if it is the scheduler, I'd want to see if there are any timeouts within HPC.

    I'd still like to be able to reproduce the problem here.  Any additional info regarding the number of nodes, cores per node, etc. would be helpful.  I've been attempting it on a homogeneous cluster with 7 nodes with 8 cores each.  The tasks are simplistic and don't demand much data work, thus memory and read/write issues are not stressed.
    Clark

    Thursday, December 2, 2010 5:27 PM
  • Thanks for looking into this issue! I would really like to see our issue go away now that our solution is based on windows HPC. Our cluster is small, we have 7 compute nodes, 4 of them with 8 cores and 3 with 2 cores. But for this task, the HPC tool always allocates the job on one of the 8 core compute nodes.

    The job with this issue has the same problem no matter how many cores I allocate (usually between 6 and 12). Each task uses a single core and does mostly database work (with two separate database servers this time).

    Concerning the problem with A and B tasks, I already combined both into a single task so that my job has only 1500 A tasks and a single B task that depends on them. I have to add that I create all the tasks using the HPC api, not the UI, so the notion of group wasn't obvious to me. I thought groups were specific to the UI.

    Another symptom of the responsiveness of the scheduler is that when I resume the task (which is near 89%), it takes 3 minutes until any task starts running. Each task takes between couple of minutes to 30 minutes. The tasks start running but once some are completed, it can take at up to a couple of minutes until some other ones are scheduled.

    I checked all systems logs including HPC logs and I did not see anything there. I'll be glad to provide any diagnostics I can collect.

    Last point, I had to disable the auto update of the etc/hosts file to get the dhcp and dns working - I cannot rely only on the hosts on each node because we need full name resolution for MSDTC.


    Erik Putrycz - Solution Architect - Apption Software
    Thursday, December 2, 2010 7:38 PM
  • Do the 1500 'A' tasks all have different tasknames?  If they do, try running them all having the same taskname.  This issue could be due to the number of tasknames that have to be checked in order to verify a group of tasks has actually finished.

    Let me know if this wasn't the issue.  If the 'A' tasks did have distinct names, did changing them all to a single identical name fix the problem?
    Clark

    Note:  Regarding always assigning the job to one of the nodes that has 8 cores available.  HPC's Job Scheduler by default allocates resources by sorting them first to nodes that have the most available cores, and if they are equal, a secondary sort based on which node has the most available memory.  If you want to change that, look into the /orderby parameter.  For example, using /orderby:-cores will sort nodes via those that have the least number of available cores.  If you want a specific node or set of nodes to be used, consider the /nodegroup or /requestednodes parameters.

    Thursday, December 2, 2010 8:58 PM
  • Do the 1500 'A' tasks all have different tasknames?  If they do, try running them all having the same taskname.  This issue could be due to the number of tasknames that have to be checked in order to verify a group of tasks has actually finished.

    Let me know if this wasn't the issue.  If the 'A' tasks did have distinct names, did changing them all to a single identical name fix the problem?
    Clark

    Note:  Regarding always assigning the job to one of the nodes that has 8 cores available.  HPC's Job Scheduler by default allocates resources by sorting them first to nodes that have the most available cores, and if they are equal, a secondary sort based on which node has the most available memory.  If you want to change that, look into the /orderby parameter.  For example, using /orderby:-cores will sort nodes via those that have the least number of available cores.  If you want a specific node or set of nodes to be used, consider the /nodegroup or /requestednodes parameters.


    Yes each task has an individual name. That's a start. But how do I set dependencies to B if all my A task have the same name? Within the API, you list the dependent tasks by name. If I add only one name as dependency, is HPC going to figure out that one name represents all the 1500 tasks?
    Erik Putrycz - Solution Architect - Apption Software
    Thursday, December 2, 2010 9:29 PM
  • Yes, if you declare a dependency of a single taskname, HPC demands that all tasks with that name have finished before it starts the next task.  That's also true of multiple dependency names:  All tasks that match any of the tasknames listed must be completed before the next task begins.

    Let me know if this solves the issue.
    Clark

    Thursday, December 2, 2010 9:46 PM
  • When I have a single task name for all my 1500 tasks, the responsiveness of the scheduler is way better, so far I haven't seen the delay when starting new tasks. Thanks for helping on this!

    On another hand, 1500 names doesn't much data to handle, any chances you could open a bug on this one?

     

     


    Erik Putrycz - Solution Architect - Apption Software
    Monday, December 6, 2010 4:51 PM
  • This issue has been submitted.  Thanks very much for helping me investigate it.

    Another potential method that could work around the system would be to not declare dependencies, but instead define the required resources needed by the later tasks to need all resources allocated to the job.

    For example, if you've defined /numcores:12 for your job, define the first 1500 tasks as each needing a single core.  For the tasks needed afterward, define it as needing all 12 cores.  Because the job won't have all tasks available until the first 1500 have finished, it wouldn't need to check them based on defined dependencies.  But be careful!  If there are tasks that are afterward, i.e. Task C is dependent on Task B, which was dependent on all 1500 Task A, the depency of C needs to remain defined, or it also needs to be defined as needing all 12 cores.

    Clark

    Monday, December 6, 2010 10:44 PM