Our process discovers what the next set of tasks is dynamcially as a job is running. Each block consists of 1-300 tasks, the last one then queues the next block. The block of tasks if large enough is added to the same job using multiple
theads and scheduler connections.
The bad behavior we see randommly, occuring at least once per week, is a job go 'crazy' and keep adding the same task over an over. Sometimes it is one task, and at other times we might see a few tasks that keep getting added. We
have seen it reach into the thousands of the same repated task before either the job is killed or we cancel it. The repeated tasks have the same name which before threw an exception from the client programatic side. Recently I added extra logging
to our code just prior to the call the .AddTask() call to verify that it wasn't something in our code getting into a loop.
I have screen shots of the job. No error events were showing in the Event viewer.
We have experienced this in both Hpc 2008 SP1 and in the last month upgraded to Hpc 2008 R2 SP1 and see the problem still. We have not been able to come up with a simple reproduction case, but it does happen fairly frequently.
Anyone else having this problem?