9. februar 2012 00:48
We are running a monte-carlo model, usually running a large number of different cases (say 1500). Each one of these cases requires a large number of iterations to account for randomness. We typically run about 1000 iterations per case. We initially set this up to run each case in a different job (1500 jobs, for example), and the 1000 iterations were handled with a parametric task one (or sometimes more) per job. However, this resulted in a huge number of jobs in the queue, and it was extremely difficult to estimate overall progress. In addition, using the balanced scheduling, each of these jobs would compete for resources as the balancing occurs over all these jobs.
We then shifted gears and tried to put all these cases under a single job, 1500 parametric tasks for each case. This worked perfectly for our tests. Scheduling worked as desired, and the progress was clear at the job level. During our testing of this, we only tried a few cases (~10 or so) so all was fine and dandy. Then when we scaled it back up to our typical work load of 1500 cases, we quickly learned of the limit that each job can only have a maximum of 100 parametric sweep tasks.
Our first thought was to add 1000 separate tasks rather than one parametric task. However, the time it would take to submit all these tasks would take days/weeks. Our next thought was to just group our jobs such that we would maybe make 15 jobs, each with 100 parametric tasks, resulting in the 1500 cases. However, this can get ugly in our cases, because sometimes each case doesn't know in advance how many parametric sweeps it will need. This makes it difficult to determine how many cases can be included under a single job.
Why does this 100 parametric task limitation exist and how can we best get around this limitation?
24. februar 2012 02:38
I just wanted to give a quick status on this. We found a suitable work around that is slightly less convenient but still works.
Instead of creating multiple parametric sweep tasks, we created a single massive parametric sweep task over a set of integers. We then map each number to a specific case (or set of arguments). So the script that is being called accepts the number as an argument, does a lookup to determine what case it is, then executes it accordingly.
It isn't as descriptive from the UI, but it does get the job done. A parametric sweep task is limited to 1,000,000 steps, which was insufficient for some of our work, so we just made multiple sweeps in the same job.
Hope this helps if you are having the same issues.
24. februar 2012 02:40Quite useful indeed - thanks for sharing! :-)