none
How to release assigned CPUs when the request queue is already empty and running requests are declining near the end of a SOA job RRS feed

  • Question

  • Dear Community,

    I'm running Jobs via the SOA method and I'm looking for the best way to release resource allocations near the end of a job when the Queued requests are 0 and the number of running requests is declining.

    I tried to nudge the scheduler by reducing the TargetResourceCount property of the job to the amount of requests that are still being calculated. Nevertheless, the scheduler keeps the idle tasks running until the last request is finished.

    To illustrate the case I visualized the behavior in a Gannt chart and a resource allocation graph, see the images below:

    Gannt chart of cluster - each row represents one of the 192 CPUs

    Note that between 9:10 and 9:20 the cluster is being idle while there are 2 jobs queued

     Cluster utilization graphs - shows active CPUs (handling requests) of the jobs

    In this test case four jobs [697 - 700] are launched almost simultaneously.

    Setup: version HPC2016 update 1; Cluster size: 192 CPUs; maxCores in Job template:96; minCores: not set by the template; Scheduling mode: Queued; Preemption: immediate. Each Task is single proces and consumes 1 CPU

    Preferred situation is that the queued jobs (699 and 700) start on the CPUs which are no longer needed for job 697 and 698.

    current (not working) solution:

    As the log snippet below shows currently the client application is reducing the TargetResourceCount property of the job (697), but the scheduler is not acting on it. 

    I could force the task to stop from the client side, however, the scheduler will most likely reschedule it immediately.

    I'm probably attacking this problem from the wrong angle, would be very happy to receive any feedback and suggestions!

    Activity log of job 697

    note here that the activity log has an one hour time difference with the graphs above
    • Edited by Joris Cramwinckel Tuesday, December 3, 2019 4:31 PM inserted note to image
    Tuesday, December 3, 2019 4:18 PM

Answers

  • Hi Joris,

    Yes, this line is related. Automatic shrink only happens in AllocationAdjust. When set the interval to -1, all grow-shrink related features are disabled.

    In our sample code, this value is 5000. Feel free to try some different values.

    Zihao

    Thursday, December 12, 2019 1:58 AM

All replies

  • Hi Joris,

    The scenario you described is a feature implemented in HPC Pack named automatic shrink. This feature only works in Balanced mode. From the information you provided, you are using Queue mode. So it is auto disabled.

    After you switch to Balanced mode, also check the cluster configuration of automatic shrink. Open an admin cmd window, type "cluscfg listparams | findstr AutomaticShrinkEnabled", ensure it shows "True".

    P.S. If you are mainly using SOA jobs, we just released public preview of our open sourced SOA framework, Microsoft Telepathy. It shares a large portion of code base with HPC Pack and use Azure Batch as back end. Please have a look if you are interesed.

    Best Regards,

    Zihao

    Monday, December 9, 2019 3:19 AM
  • Hi Zihao,

    Thanks a lot for the suggestion. I ran an slightly different case with the Scheduler mode set to Balanced and I verified the AutomaticShrinkEnabled property as you suggested.

    The testcase is as follows:

    I fired 4 jobs, all with the same job template, requesting 96 CPUs each on a 192 CPU cluster. The first arriving job (684) gets assigned the full 96, while the the others share the remaining resources somewhat equally (as expected). However, when job 684 is ending, cpus are kept allocated to the job even when there are no requests left. The TargetResourceCount has been actively reduced by the client application but the brokernode seems to ignore these events.

    The jobs in this example do not have very long tails but the case is clear enough to show that our goal is to minimize gaps like the one displayed in the graphs between 14:50 and 15:00.

    btw, the reason we overwrite the TargetResourceCount is that our task dependencies have a DAG structure and the initialization of our tasks on a cold CPU typically take around 40 seconds, that's why we keeping them warm during the job. However I did not find a proper method yet to force the broker node to release the idle service tasks near the end.

    Do you happen to have other suggestions? Very much appreciated!

    Best Regards,

    Joris

    Gannt chart

    Cluster utilization


    Joris Cramwinckel

    Tuesday, December 10, 2019 4:01 PM
  • Hi Joris,

    Could you help us to collect HPC log files related to this issue?

    We will need broker worker logs, in %CCP_LOGROOT_SYS%SOA\HpcBrokerWorker_*.bin

    Zihao

    Wednesday, December 11, 2019 1:59 AM
  • I can share the logs, how do you want me to send them to you?

    maybe analyzing the broker config from the logs would already help. the jobs run with the configuration below:

    [Monitor]
       LoadSamplingInterval = 1000
       AllocationAdjustInterval = -1             <- This setting is currently my prime suspect
       ClientIdleTimeout = 3000000
       ClientConnectionTimeout = 300000
       SessionIdleTimeout = 60000
       StatusUpdateInterval = 15000
       MessageThrottleStartThreshold = 4096
       MessageThrottleStopThreshold = 3072
       ClientBrokerHeartbeatInterval = 20000
       ClientBrokerHeartbeatRetryCount = 3
    [BaseAddress]
       Http = 
       Https = 
       NetTcp = 
    [LoadBalancing]
       ServiceRequestPrefetchCount = 1
       EndpointNotFoundRetryCountLimit = 10
       EndpointNotFoundRetryPeriod = 300000
       MultiEmissionDelayTime = -1
       MessageResendLimit = 3
       ServiceOperationTimeout = 86400000
       MaxConnectionCountPerAzureProxy = 16
       DispatcherCapacityInGrowShrink = 0

    Cheers,

    Joris


    Joris Cramwinckel


    Wednesday, December 11, 2019 2:10 PM
  • Hi Joris,

    Yes, this line is related. Automatic shrink only happens in AllocationAdjust. When set the interval to -1, all grow-shrink related features are disabled.

    In our sample code, this value is 5000. Feel free to try some different values.

    Zihao

    Thursday, December 12, 2019 1:58 AM
  • I'm glad to say that indeed the AllocationAdjustInterval was causing the broker not to shrink the resources. As the graph below shows the long tailed job (768) freed its resources short after finishing its last task. Thanks Zihao for the suggestions!

    Another effect is that now both our client application and the broker are setting the TargetResourceCount of the Job. This can lead to opposing values. Since the client is aware of the upcomming tasks we want to mute the broker. Since this is a different problem I will start a new thread on finding the best solution to keep the CPUs warm.

    after implementing Zihao's  suggestion



    Joris Cramwinckel

    Thursday, December 12, 2019 12:36 PM