Long vs. short-running jobs RRS feed

  • Question

  • We're evaluating HPC to see how the following issues can be addressed:

    a. ~ 100,000 short-running atomic calculations processed by the cluster and results returned to the client. Each calculation  is very fast and hence submitting the calculation requests in batches seems like the way to go to avoid session start-up costs and so forth. There're no dependencies for each calculation, it's pure number-crunching.

    b. ~ 2,000 longer-running calculations (each one takes ~1-2 minutes) processed by the cluster and results returned to the client. No dependencies for those as well, merely a longer time required to produce the final set of results.

    The SOA approach of developing services exposing various calculation/valuation scenarios seems like an excellent fit, but a few questions arose in the process of looking at it:

    1. What are the various approaches that people are using for this scenario? I've taken a look at the SOA tutorials/samples recently provided by the MSFT team and those are helpful, but the main question - for ~ 100,000 short-running calculations, would it make sense to submit the requests in one batch and await the results, (synchronously or asynchronously) or are there better ways to go?

    2. For longer-running calculations, we need the ability to prioritize those to ensure that they won't take all resources and leave the short-running processes in limbo waiting for this to complete. It seems like job templates / priorities is the way to address this. Are we going down the right path here?

    Any thoughts/takes on the above would be greatly appreciated!

    Thursday, April 25, 2013 5:35 PM