locked
BrokerPersistQueue errors out at default time even after editing RRS feed

  • Question

    1. I've read the forum post https://social.microsoft.com/Forums/en-US/72621f6f-021e-4ef0-9098-5ae92ff90197/brokerpersistqueue-flush-timeout-what-does-it-mean?forum=windowshpcsched which contained the internal link https://technet.microsoft.com/en-us/library/ff943786(WS.10).aspx#BKMK_broker. I tried editing the service file 'Microsoft.Hpc.Excel.ExcelService' containing 'serviceInitializationTimeout' to be '120,000' (double the time) to allow it to fill the queue, it still times out after '60000'. Is this the correct file to edit, if not which should it be?
    2. The queue 'Number of Requests' field in the Cluster Manager>Job Management queues up to around 12,300 (out of 61000+). 'messageThrottleStartThreshold' and 'messageThrottleStopThreshold' were left at their default values of 4096 and 3072, respectively. I thought the throttle limit would only allow the requests to go up to 4096, or does the cluster manager show all requests and only the broker allows 4096?
    3. I left a job running even after the msgbox popped up with 'BrokerPersistQueue' and the outstanding requests continued to go down. Is this the correct behavior? It seems odd that it would continue to send off requests after an error occurred.
    Thursday, June 29, 2017 8:09 PM

Answers

  • After working on it some more it seems like the issue was with messageThrottleStartThreshold and messageThrottleStopThreshold. With the default setting I would reach a BrokerPersistQueue at some point during the run. With further testing I increased the StartThreshold to be above the number of requests I needed to send and so far that has allowed it to run without any timeout issues. I read the description for the Start/StopThreshold https://technet.microsoft.com/en-us/library/ff877822(v=ws.10).aspx and understand that the memory limit may pose a factor as would setting the Theshold too close together may underutilize the nodes.

    • Marked as answer by IvenBach Saturday, July 22, 2017 12:07 AM
    Saturday, July 15, 2017 1:23 AM

All replies

  • Hi IvenBach,

    1. 'serviceInitializationTimeout' is used for broker worker to handle the expected EndpointNotFoundException when the service host is initializing. It is not related to the flush timeout. The broker flush timeout is by default 60,000 ms and can be specified at the client call brokerClient.EndRequests(int timeoutMilliseconds) or brokerClient.Flush(int timeoutMilliseconds). If you are using ExcelService for workbook offloading, the EndRequests api call is in the underlying ExcelDriver with the default timeout, you cannot specify it in the workbook VBA macros. To avoid the timeout, you may consider to reduce the number of requests in a session and/or improve the network IO and broker node hardware spec.

    2. 'messageThrottleStartThreshold' and 'messageThrottleStopThreshold' are used for broker node to start/stop throttling for new requests according to the current number of messages (both requests and responses) in the broker queues. These numbers are different from the total 'Number of Requests' of a session. A request, especially when completed with the response retrieved by the client, can be removed from the broker queues.

    3. For interactive session (used as default in ExcelService), the requests would be dispatched by the broker once they are recieved from the client, there is no wait for the EndRequests call. So the outstanding requests would go down. You can close the broker client or the session to stop processing.

    Regards,

    Yutong Sun

    Monday, July 3, 2017 2:45 AM
  • Yutong,

    Regarding 1) When submitting a large number of requests, possibly 100,000 or more, I won't know the amount until I'm going to run the session. I'm following the instructions found at https://docs.microsoft.com/en-us/azure/virtual-machines/windows/excel-cluster-hpcpack to offload the workbook. Your answer is that the default limit can't be changed, correct?

    I used 'HPC Pack cluster for Excel workloads' to have Azure create the cluster for me. When using the cluster pack where can one choose the broker node? I didn't see this option when setting up the cluster.

    Tuesday, July 11, 2017 12:33 AM
  • Hi IvenBach,

    Right, the default EndRequests timeout is limited to 60 seconds and it cannot be changed in the Excel VBA macros. To work around, you may split a large number of requests into smaller batches, e.g. 1,000 requests * 100 batches, and each batch uses a broker client in the HPCExcelClient.Run VBA call. We support multiple runs/broker clients in one SOA session.

    If you use Azure ARM template to deploy the cluster, the broker node is also the head node. If you use the PowerShell deployment script, you may choose to create standalone broker nodes via the cluster configuration xml. Just note in Cloud Service mode, the broker node VM should have the same name as its Cloud Service, thus only one brorker node per Cloud Service.

    Regards,

    Yutong Sun 

    Tuesday, July 11, 2017 6:05 AM
  • From what I understood of your response 'We support multiple runs/broker clients in one SOA session.' you were referring to the IExcelClient.Initialize() method? I updated the dependFiles parameter with an argument of 'LocalFilePath1=RemoteFilePath1;LocalFilePath2=RemoteFilePath2;...;LocalFilePathN=RemoteFilePathN". I ran my batches in groups of 1000 as you suggested and the Run() method ran for a little over 6 minutes before a BrokerPersistQueue occurred, after the 60,000 ms limit. Do I need limit each batch down to have it run in under a minute? It feels like there's something I'm not understanding still.
    Wednesday, July 12, 2017 12:19 AM
  • After working on it some more it seems like the issue was with messageThrottleStartThreshold and messageThrottleStopThreshold. With the default setting I would reach a BrokerPersistQueue at some point during the run. With further testing I increased the StartThreshold to be above the number of requests I needed to send and so far that has allowed it to run without any timeout issues. I read the description for the Start/StopThreshold https://technet.microsoft.com/en-us/library/ff877822(v=ws.10).aspx and understand that the memory limit may pose a factor as would setting the Theshold too close together may underutilize the nodes.

    • Marked as answer by IvenBach Saturday, July 22, 2017 12:07 AM
    Saturday, July 15, 2017 1:23 AM
  • Hi IvenBach,

    For multiple runs, it is HPCExcelClient.Run method. You may call this method in a loop, and partition the calculations according to the loop count.

    Right, message throttling could affect receiving requests at the broker side, so EndRequests may timeout. If memory consumption is not a problem, you may increase the messageThrottleStartThreshold to avoid message throttling, so that all requests can be received and EndRequests can complete in a timely fashion.

    Regards,

    Yutong Sun

    Thursday, July 20, 2017 8:07 AM
  • Yutong,

    A) Should I create an array of HPCExcelClients and have each one call their respective Initialize and OpenSession methods in the loop? I received an error when trying to call HPCExcelClient.Initialize or HPCExcelClient.OpenSession after it was initially set for each. I would prefer to have smaller batches run and receive periodic results back instead of sending every request out and waiting for the entire run to finish.

    B) If I disconnect and I'm waiting for results have I lost the entire run or is there a way to log re-connect and retrieve the results.

    Saturday, July 22, 2017 12:14 AM