We are developing a system to do Data Parallelism (image processing) and can not use the parametric features since our process is non-numeric. Our tasks are added dynamically during the process to a running job (what needs to be done is dependant on
what is discovered on the image).
The tasks are add via a web service so the time to add them is limited to <300 seconds.
We have run into a barrier where we need to be able to add ~20,000 tasks to a running job.
I wrote up a test app to test SP1 vs. R2 RC1 and get the results below. It was tested on VMs where R2 RC1 had the DBs off the VM on another physical machine.
[SubmitTask() 5000 total]
Hpc Ver Job State Avg. Time / Task (milliseconds)
SP1 Running 519
SP1 Configuring 44
R2 RC1 Running 496
R2 RC1 Configuring 68
[SubmitTasks() 5000 total]
R2 RC1 Running 20
R2 RC1 Configuring 64
NOTE: When tested on high end physical machine vs. virtual machine for SP1 tests there was little difference in timings.
1. Adding a single task to a running job takes 500ms or 50ms to a job that is still in the configuring state.
2. Moving to SubmitTasks() is possible when R2 is released but 20k tasks will still take 6.7 minutes to submit, way over the web method call limit.
None of the new or old methods are performant enough is there a better way to do this?
What are the limits to the number of tasks in a job? We are projecting a single job (process one image) would contain 100,000 to a million tasks by the end.