SQL Tuning to Address Job & Task Performance<div>Like others I see posting here, we are experiencing performance problems on our Windows HPC 2008 cluster with task management in jobs that include 100's of tasks.  I see from those other posts that MSFT is working to address these performance problems.  Will that come in the form of a service pack or other?  When can we expect some help?</div> <div><br/></div> <div><br/></div> <div>As for right now, I'd like to ask about what we can do to tune/optimize the SQL database.  To give some background, our application architecture has a workflow web service that is directing work on our cluster.  This workflow web service is using the .NET Scheduler API to add tasks to existing jobs.  </div> <div><br/></div> <div>Here's what we're doing to squeeze better performance out of the scheduling and some questions around each.</div> <div><br/></div> <div>1) We've found that it takes seconds to establish a connection to scheduler so we're implementing a connection pool to manage ISchedulerJob instances</div> <div><br/></div> <div>2) We've run SQL Profiler on the CCPClusterService database and see many operations that are taking 10+ secs so we're taking standard database administration steps to seperate data, indexes, and log files as well as run maintenance plans on the database.  Can we do any of the following?</div> <div><br/></div> <div>- use x64 SQL Server?</div> <div>- use SQL 2008?</div> <div>- offload SQL load on to another box?   I know HPC Head Node wants a COMPUTECLUSTER named instance but can it be remote?</div> <div>- tweak fill factor values to limit index fragmentation?</div> <div>- modify indexing?  does MSFT have any revised index guidance?</div> <div><br/></div> <div><br/></div> <div>I appreciate any feedback from the community and will share any practices that we come up with.</div> <div><br/></div> <div>Thanks!</div> <div>Luke</div> <div><br/></div>© 2009 Microsoft Corporation. All rights reserved.Wed, 03 Jun 2009 21:52:45 Z4e9b05fa-40e0-4c84-ae52-894e82742b97http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#4e9b05fa-40e0-4c84-ae52-894e82742b97http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#4e9b05fa-40e0-4c84-ae52-894e82742b97Surlyhttp://social.microsoft.com/Profile/en-US/?user=SurlySQL Tuning to Address Job & Task Performance<div>Like others I see posting here, we are experiencing performance problems on our Windows HPC 2008 cluster with task management in jobs that include 100's of tasks.  I see from those other posts that MSFT is working to address these performance problems.  Will that come in the form of a service pack or other?  When can we expect some help?</div> <div><br/></div> <div><br/></div> <div>As for right now, I'd like to ask about what we can do to tune/optimize the SQL database.  To give some background, our application architecture has a workflow web service that is directing work on our cluster.  This workflow web service is using the .NET Scheduler API to add tasks to existing jobs.  </div> <div><br/></div> <div>Here's what we're doing to squeeze better performance out of the scheduling and some questions around each.</div> <div><br/></div> <div>1) We've found that it takes seconds to establish a connection to scheduler so we're implementing a connection pool to manage ISchedulerJob instances</div> <div><br/></div> <div>2) We've run SQL Profiler on the CCPClusterService database and see many operations that are taking 10+ secs so we're taking standard database administration steps to seperate data, indexes, and log files as well as run maintenance plans on the database.  Can we do any of the following?</div> <div><br/></div> <div>- use x64 SQL Server?</div> <div>- use SQL 2008?</div> <div>- offload SQL load on to another box?   I know HPC Head Node wants a COMPUTECLUSTER named instance but can it be remote?</div> <div>- tweak fill factor values to limit index fragmentation?</div> <div>- modify indexing?  does MSFT have any revised index guidance?</div> <div><br/></div> <div><br/></div> <div>I appreciate any feedback from the community and will share any practices that we come up with.</div> <div><br/></div> <div>Thanks!</div> <div>Luke</div> <div><br/></div>Thu, 23 Apr 2009 18:28:08 Z2009-04-23T18:28:08Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#87506f20-b6bc-479a-88e2-d3ae255cee9bhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#87506f20-b6bc-479a-88e2-d3ae255cee9bJosh Barnardhttp://social.microsoft.com/Profile/en-US/?user=Josh%20BarnardSQL Tuning to Address Job & Task Performance<p>Surly, it would be a huge help for us to get more information.  What types of jobs are you submitting?  How are you submitting them (code sample would be great!)?  How long do you expect it to take vs. how long is it actually taking?<br/><br/>More information on how to configure SQL with HPC is available here: <a href="http://go.microsoft.com/fwlink/?LinkId=137791">http://go.microsoft.com/fwlink/?LinkId=137791</a><br/><br/>Thanks!<br/>Josh</p><hr class="sig">-JoshFri, 24 Apr 2009 06:24:04 Z2009-04-24T06:24:04Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#aa02612b-3293-48fc-b8ea-6eef74e5f168http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#aa02612b-3293-48fc-b8ea-6eef74e5f168Surlyhttp://social.microsoft.com/Profile/en-US/?user=SurlySQL Tuning to Address Job & Task Performance<div>Hi Josh,</div> <div><br/></div> <div>Thanks for your reply and the pointer to that doc.  I reviewed it quickly and will go back in more detail.  From what I see, it is not possible to use a remote SQL database.  If that is true then it is a real flaw in the architecture that I hope you guys can address in next release, because most everyone has $'s committed to a robust SQL server with sophisticated storage backing it before HPC comes into the picture.  Allow us to use that infrastructure as the SQL backend for the head node too.  Can't you just expose a connection string somewhere?</div> <div><br/></div> <div><br/></div> <div>We've got a 16 node, 128 core cluster that is running image processing on very large images - several gigabytes.  We've got a &gt;25 step hierarchical process that we run on each image and it is totally data-driven i.e. what we find in the image drives what the next step in the process is.  A job represents processing of one image and the tasks are dynamically determined and populated via workflow web service.  We are not using MPI yet so all of our parallelism is achieved thru scheduler - dynamically subdividing the image into processing units and executing each of those units as a task.  The EXE run by each task calls back to the workflow web service (WS) when complete, tells WS what it did, the WS loads up the ISchedulerJob from HPC via IScheduler connection, determines if all sibling tasks are complete, and if so then adds the next task in the sequence.</div> <div><br/></div> <div><br/></div> <div>A couple odds and ends:</div> <div><br/></div> <div>- I said in my original email that we are pooling ISchedulerJob instances but I meant IScheduler instances</div> <div><br/></div> <div>- As a rule of thumb, our tasks will not take less than 30 seconds.  However, we could have 100+ tasks complete within a couple seconds of each other.  Each of them would call back to the web service and cause a lookup of the job on the head node.  The last one to finish would cause another 100+ tasks to be added to the job and this happens one-by-one because there is no way to throw a collection of tasks over with one call that we know of.  This all should occur in a couple seconds (1-5).  I think API support for N tasks would really solve this issue because you could do one roundtrip to the database instead of N.</div> <div><br/></div> <div>- Our approach is partially complicated by struggles to make Task Dependencies work.  My recollection is that we couldn't get these to work w/o a pre-existing Job Template.  Again,  our process is quite dynamic so the pre-constructed job template isn't flexible enough.  Are there known bugs in this area of Task Dependencies?</div> <div><br/></div> <div><br/></div> <div>I'll ask my engineer responsible for these features to post code examples of our task addition and example of the task dependency problems we're experiencing.</div> <div><br/></div> <div>Thanks!</div> <div>Luke</div> <div><br/></div>Fri, 24 Apr 2009 18:00:50 Z2009-04-24T18:00:50Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#e4346ae9-14ed-4322-bd33-672a951191d5http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#e4346ae9-14ed-4322-bd33-672a951191d5Surlyhttp://social.microsoft.com/Profile/en-US/?user=SurlySQL Tuning to Address Job & Task PerformanceJosh, <div><br/></div> <div>One other thing, it would be EXTREMELY helpful to us if the task that failed was differentiated from all the others that are cancelled.  When one task fails it causes the others to be cancelled but they also show up as in Failed state in the Cluster/Job Manager's Job view.  This forces us to hunt through everyone of them currently.</div> <div><br/></div> <div>-Luke</div>Fri, 24 Apr 2009 18:20:57 Z2009-04-24T18:20:57Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#3d73b38e-b279-4f05-8837-63e422132b54http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#3d73b38e-b279-4f05-8837-63e422132b54Mithlanhttp://social.microsoft.com/Profile/en-US/?user=MithlanSQL Tuning to Address Job & Task Performance<div>Josh,</div> <div><br/></div> Below is a code example where you can't get two tasks to converge back into one task.  See code for more notes. <div><br/></div> <div>-Scott<br/> <div><br/></div> <div>//// UNIT TEST SET #1 OUTPUT</div> <div> <div>Took 749 ms to connect to devhead</div> <div>Job 5024</div> <div>Adding tasks - Level A</div> <div>Task Added 0 Refresh took 0 ms.</div> <div>Task Submitted 5024.1 Refresh took 70 ms.</div> <div>Job Submitted 5024 SubmitJobById took 138 ms.</div> <div>Adding tasks - Level B</div> <div>Task Added 0 Refresh took 0 ms.</div> <div>Task Submitted 5024.2 Refresh took 385 ms.</div> <div>Task Added 0 Refresh took 0 ms.</div> <div>Task Submitted 5024.3 Refresh took 498 ms.</div> <div>Adding tasks - Level C</div> <div>Task Added 0 Refresh took 0 ms.</div> <div>----&gt; Exception thrown that &quot;Task3&quot; does not exist.  Task 4 shows up in the HPC Job Manager but is marked as failed.</div> </div> <div><br/></div> <div>//// UNIT TEST SET #2 OUTPUT</div> <div> <div>Took 621 ms to connect to devhead</div> <div>Job 5025</div> <div>Adding tasks - Level A</div> <div>Task Added 0 Refresh took 0 ms.</div> <div>Task Submitted 5025.1 Refresh took 66 ms.</div> <div>Job Submitted 5025 SubmitJobById took 131 ms.</div> <div>Adding tasks - Level B</div> <div>Task Added 0 Refresh took 0 ms.</div> <div>Task Submitted 5025.2 Refresh took 390 ms.</div> <div>Adding tasks - Level C</div> <div>Task Added 0 Refresh took 0 ms.</div> <div>Task Submitted 5025.3 Refresh took 496 ms.</div> <div>End.</div> <div><br/></div> </div> <div>//// CODE BELOW HERE</div> <div><br/></div> <div> <div>using System;</div> <div>using System.Collections.Generic;</div> <div>using System.Linq;</div> <div>using System.Text;</div> <div>using System.Diagnostics;</div> <div>using System.Threading;</div> <div><br/></div> <div>using Microsoft.Hpc.Scheduler;</div> <div>using Microsoft.Hpc.Scheduler.Properties;</div> <div><br/></div> <div>namespace HpcDependsTest</div> <div>{</div> <div><span style="white-space:pre"> </span>class Program</div> <div><span style="white-space:pre"> </span>{</div> <div><span style="white-space:pre"> </span>static void Main(string[] args)</div> <div><span style="white-space:pre"> </span>{</div> <div><span style="white-space:pre"> </span>String sHeadnode = &quot;devhead&quot;;</div> <div><span style="white-space:pre"> </span>string[] sTaskNamesEmpty = {};</div> <div><br/></div> <div><span style="white-space:pre"> </span>//</div> <div><span style="white-space:pre"> </span>//UNIT TEST SET #1</div> <div><span style="white-space:pre"> </span>//</div> <div><span style="white-space:pre"> </span>// THIS FAILS</div> <div><span style="white-space:pre"> </span>//</div> <div><br/></div> <div><span style="white-space:pre"> </span>//  T         Task 1</div> <div><span style="white-space:pre"> </span>//  i        /     \</div> <div><span style="white-space:pre"> </span>//  m       |       |</div> <div><span style="white-space:pre"> </span>//  e    Task 2   Task 3</div> <div><span style="white-space:pre"> </span>//          |       |</div> <div><span style="white-space:pre"> </span>//  |        \     /</div> <div><span style="white-space:pre"> </span>//  v        Task 4</div> <div><br/></div> <div><span style="white-space:pre"> </span>string[] sTaskNamesA = {&quot;Task1&quot;};</div> <div><span style="white-space:pre"> </span>string[] sTaskNamesB = {&quot;Task2&quot;, &quot;Task3&quot;};</div> <div><span style="white-space:pre"> </span>string[] sTaskNamesC = {&quot;Task4&quot;};</div> <div><br/></div> <div><span style="white-space:pre"> </span>//</div> <div><span style="white-space:pre"> </span>//UNIT TEST SET #1</div> <div><span style="white-space:pre"> </span>//</div> <div><br/></div> <div><span style="white-space:pre"> </span>//</div> <div><span style="white-space:pre"> </span>//UNIT TEST SET #2</div> <div><span style="white-space:pre"> </span>//</div> <div><span style="white-space:pre"> </span>// THIS PASSES</div> <div><span style="white-space:pre"> </span>//</div> <div><br/></div> <div><span style="white-space:pre"> </span>//  T         Task 1</div> <div><span style="white-space:pre"> </span>//  i           |     </div> <div><span style="white-space:pre"> </span>//  m           |</div> <div><span style="white-space:pre"> </span>//  e         Task 2   </div> <div><span style="white-space:pre"> </span>//              |</div> <div><span style="white-space:pre"> </span>//  |           |</div> <div><span style="white-space:pre"> </span>//  v        Task 4</div> <div><br/></div> <div><span style="white-space:pre"> </span>//string[] sTaskNamesA = { &quot;Task1&quot; };</div> <div><span style="white-space:pre"> </span>//string[] sTaskNamesB = { &quot;Task2&quot; };</div> <div><span style="white-space:pre"> </span>//string[] sTaskNamesC = { &quot;Task4&quot; };</div> <div><br/></div> <div><span style="white-space:pre"> </span>//</div> <div><span style="white-space:pre"> </span>//UNIT TEST SET #2</div> <div><span style="white-space:pre"> </span>//</div> <div><span style="white-space:pre"> </span></div> <div><span style="white-space:pre"> </span>Stopwatch watch = new Stopwatch();</div> <div><span style="white-space:pre"> </span>watch.Start();</div> <div><span style="white-space:pre"> </span>IScheduler scheduler = new Scheduler();</div> <div><span style="white-space:pre"> </span>scheduler.Connect(sHeadnode);</div> <div><span style="white-space:pre"> </span>watch.Stop();</div> <div><span style="white-space:pre"> </span>Console.WriteLine(&quot;Took &quot; + watch.ElapsedMilliseconds + &quot; ms to connect to &quot; + sHeadnode);</div> <div><span style="white-space:pre"> </span>ISchedulerJob Job = scheduler.CreateJob();</div> <div><span style="white-space:pre"> </span>Job.Name = &quot;HpcDependsTest&quot;;</div> <div><span style="white-space:pre"> </span>Job.FailOnTaskFailure = true;</div> <div><span style="white-space:pre"> </span>Job.IsExclusive = false;</div> <div><br/></div> <div><span style="white-space:pre"> </span>scheduler.AddJob(Job);</div> <div><span style="white-space:pre"> </span>Job.Refresh();  //To get Job Id</div> <div><span style="white-space:pre"> </span>int iJobId = Job.Id;</div> <div><span style="white-space:pre"> </span>Console.WriteLine(&quot;Job &quot; + iJobId);</div> <div><br/></div> <div><span style="white-space:pre"> </span>Console.WriteLine(&quot;Adding tasks - Level A&quot;);</div> <div><span style="white-space:pre"> </span>Addtasks(Job, sTaskNamesA, sTaskNamesEmpty);</div> <div><br/></div> <div><span style="white-space:pre"> </span>watch.Reset();</div> <div><span style="white-space:pre"> </span>watch.Start();</div> <div><span style="white-space:pre"> </span>scheduler.SubmitJobById(iJobId, &quot;USER NAME HERE&quot;, &quot;PASSWORD HERE&quot;);</div> <div><span style="white-space:pre"> </span>watch.Stop();</div> <div><span style="white-space:pre"> </span>Console.WriteLine(&quot;Job Submitted &quot; + iJobId + &quot; SubmitJobById took &quot; + watch.ElapsedMilliseconds + &quot; ms.&quot;);</div> <div><br/></div> <div><span style="white-space:pre"> </span>Console.WriteLine(&quot;Adding tasks - Level B&quot;);</div> <div><span style="white-space:pre"> </span>Addtasks(Job, sTaskNamesB, sTaskNamesA);</div> <div><span style="white-space:pre"> </span></div> <div><span style="white-space:pre"> </span>//</div> <div><span style="white-space:pre"> </span>//FOR UNIT TEST SET #1</div> <div><span style="white-space:pre"> </span>//None of these next lines will allow you to add Task4 w/ dependencies to Task2 &amp; Task3</div> <div><span style="white-space:pre"> </span>//You fail during submit of Task 4 at Job.SubmitTask(task); with this exception</div> <div><br/></div> <div><span style="white-space:pre"> </span>//Job.Commit();</div> <div><span style="white-space:pre"> </span>//Job.Refresh();</div> <div><span style="white-space:pre"> </span>//Job = scheduler.OpenJob(iJobId);</div> <div><span style="white-space:pre"> </span></div> <div><span style="white-space:pre"> </span>/*</div> <div><span style="white-space:pre"> </span> * Invalid task dependency: There is no task with the name Task3.  Check your spelling and try again.</div> <div><span style="white-space:pre"> </span> */</div> <div><span style="white-space:pre"> </span>//FOR UNIT TEST SET #1</div> <div><span style="white-space:pre"> </span>//</div> <div><br/></div> <div><span style="white-space:pre"> </span>Console.WriteLine(&quot;Adding tasks - Level C&quot;);</div> <div><span style="white-space:pre"> </span>Addtasks(Job, sTaskNamesC, sTaskNamesB);</div> <div><br/></div> <div><span style="white-space:pre"> </span>Console.WriteLine(&quot;End.&quot;);</div> <div><span style="white-space:pre"> </span>}</div> <div><br/></div> <div><span style="white-space:pre"> </span>public static void Addtasks(ISchedulerJob Job, string[] taskNames, string[] taskDepsNames)</div> <div><span style="white-space:pre"> </span>{</div> <div><span style="white-space:pre"> </span>Stopwatch watch = new Stopwatch();</div> <div><span style="white-space:pre"> </span>foreach( String sTaskName in taskNames )</div> <div><span style="white-space:pre"> </span>{</div> <div><span style="white-space:pre"> </span>ISchedulerTask task = Job.CreateTask();</div> <div><span style="white-space:pre"> </span>task.CommandLine = &quot;ping -n 5 localhost&quot;;</div> <div><span style="white-space:pre"> </span>task.Name = sTaskName;</div> <div><span style="white-space:pre"> </span>task.IsParametric = false;</div> <div><span style="white-space:pre"> </span>task.IsExclusive = false;</div> <div><span style="white-space:pre"> </span>task.Runtime = 1440 * 60;</div> <div><span style="white-space:pre"> </span>task.StdOutFilePath = &quot;NUL&quot;;</div> <div><span style="white-space:pre"> </span>task.StdErrFilePath = &quot;NUL&quot;;</div> <div><br/></div> <div><span style="white-space:pre"> </span>foreach( string sTaskDepName in taskDepsNames)</div> <div><span style="white-space:pre"> </span>{</div> <div><span style="white-space:pre"> </span>task.DependsOn.Add(sTaskDepName);</div> <div><span style="white-space:pre"> </span>}</div> <div><br/></div> <div><span style="white-space:pre"> </span>watch.Reset();</div> <div><span style="white-space:pre"> </span>watch.Start();</div> <div><span style="white-space:pre"> </span>Console.WriteLine(&quot;Task Added &quot; + task.TaskId.JobTaskId + &quot; Refresh took &quot; + watch.ElapsedMilliseconds + &quot; ms.&quot;);</div> <div><span style="white-space:pre"> </span>watch.Stop();</div> <div><span style="white-space:pre"> </span></div> <div><span style="white-space:pre"> </span>watch.Reset();</div> <div><span style="white-space:pre"> </span>watch.Start();</div> <div><span style="white-space:pre"> </span>Job.SubmitTask(task);</div> <div><span style="white-space:pre"> </span>watch.Stop();</div> <div><span style="white-space:pre"> </span>Console.WriteLine(&quot;Task Submitted &quot; + task.TaskId + &quot; Refresh took &quot; + watch.ElapsedMilliseconds + &quot; ms.&quot;);</div> <div><span style="white-space:pre"> </span>}</div> <div><span style="white-space:pre"> </span>}</div> <div><span style="white-space:pre"> </span>}</div> <div>}</div> <div><br/></div> </div> </div>Fri, 24 Apr 2009 20:00:09 Z2009-04-24T20:00:09Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#3fa29926-d1ef-4341-acbc-b27a0e91db98http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#3fa29926-d1ef-4341-acbc-b27a0e91db98Mithlanhttp://social.microsoft.com/Profile/en-US/?user=MithlanSQL Tuning to Address Job & Task Performance<div><br/></div> Our process work flow as Luke mentined is dynamic.  What we find in the image determines what happens later on in the process.  So the process in place has 3 processing levels; whole image, part of an image (blob) and tiles that make up the blob.  There are in number, a few blobs per image and approx 100 tiles per blob to give you an idea of quanities. <div><br/></div> <div>Each HPC task prior to exiting callsback to the centeral web service that manages the job. In this callback we determine what tasks need to be added next.  </div> <div><br/></div> <div>In the case we are going from many tiles to process the blob we need all the tiles to complete processing before running the next step on the blob.  This is the case that does not work in the test code above.</div> <div><br/></div> <div>So to workaround this the callback won't add the next step for the blob till the last tile step callsback to notify it's completion.  </div> <div>For each tile callback a query to GetTaskList() is done to determine if this is the last task completing to move to the next step.  This adds quite a bit of load on the Hpc SQL instance since a 100 of them can be asking the same thing nearly all at once.</div> <div>//</div> <div> <div><span style="white-space:pre"> </span>IFilterCollection filters = GetScheduler().CreateFilterCollection();</div> <div><span style="white-space:pre"> </span>filters.Add(FilterOperator.Equal, PropId.Task_State, TaskState.Finished);</div> <div><span style="white-space:pre"> </span>filters.Add(FilterOperator.Equal, PropId.Task_ParentJobId, m_iJobId);</div> <div><span style="white-space:pre"> </span>ISchedulerCollection tasks = GetTaskList(filters, null, false);</div> </div> <div>//</div> <div><br/></div> <div>If we could get the dependancy tree to work the getTaskList(..) above would no longer be needed.  This would definatly help with scalablity and performance and we could move to only haveing one of the tile tasks callback to the webservice.</div> <div><br/></div> <div>Performance savings all around.</div> <div><br/></div> <div>-Scott</div>Fri, 24 Apr 2009 21:10:15 Z2009-04-24T21:10:15Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#32f16c2e-92dc-4043-9e10-24f766afa738http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#32f16c2e-92dc-4043-9e10-24f766afa738Josh Barnardhttp://social.microsoft.com/Profile/en-US/?user=Josh%20BarnardSQL Tuning to Address Job & Task Performance<blockquote>Josh, <div><br/></div> <div>One other thing, it would be EXTREMELY helpful to us if the task that failed was differentiated from all the others that are cancelled.  When one task fails it causes the others to be cancelled but they also show up as in Failed state in the Cluster/Job Manager's Job view.  This forces us to hunt through everyone of them currently.</div> <div><br/></div> <div>-Luke</div> </blockquote> <br/>We're working on making this easier to diagnose in v3.<hr class="sig">-JoshMon, 11 May 2009 22:31:00 Z2009-05-11T22:31:00Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#6948cfe7-0793-4f29-9c9b-b08821a10480http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#6948cfe7-0793-4f29-9c9b-b08821a10480Josh Barnardhttp://social.microsoft.com/Profile/en-US/?user=Josh%20BarnardSQL Tuning to Address Job & Task Performance<blockquote> <div>Josh,</div> <div><br/></div> Below is a code example where you can't get two tasks to converge back into one task.  See code for more notes. <div><br/></div> <div>-Scott<br/> <div><br/></div> <div>//// UNIT TEST SET #1 OUTPUT</div> <div> <div>Took 749 ms to connect to devhead</div> <div>Job 5024</div> <div>Adding tasks - Level A</div> <div>Task Added 0 Refresh took 0 ms.</div> <div>Task Submitted 5024.1 Refresh took 70 ms.</div> <div>Job Submitted 5024 SubmitJobById took 138 ms.</div> <div>Adding tasks - Level B</div> <div>Task Added 0 Refresh took 0 ms.</div> <div>Task Submitted 5024.2 Refresh took 385 ms.</div> <div>Task Added 0 Refresh took 0 ms.</div> <div>Task Submitted 5024.3 Refresh took 498 ms.</div> <div>Adding tasks - Level C</div> <div>Task Added 0 Refresh took 0 ms.</div> <div>----&gt; Exception thrown that &quot;Task3&quot; does not exist.  Task 4 shows up in the HPC Job Manager but is marked as failed.</div> </div> <div><br/></div> <div>//// UNIT TEST SET #2 OUTPUT</div> <div> <div>Took 621 ms to connect to devhead</div> <div>Job 5025</div> <div>Adding tasks - Level A</div> <div>Task Added 0 Refresh took 0 ms.</div> <div>Task Submitted 5025.1 Refresh took 66 ms.</div> <div>Job Submitted 5025 SubmitJobById took 131 ms.</div> <div>Adding tasks - Level B</div> <div>Task Added 0 Refresh took 0 ms.</div> <div>Task Submitted 5025.2 Refresh took 390 ms.</div> <div>Adding tasks - Level C</div> <div>Task Added 0 Refresh took 0 ms.</div> <div>Task Submitted 5025.3 Refresh took 496 ms.</div> <div>End.</div> <div><br/></div> </div> <div>//// CODE BELOW HERE</div> <div><br/></div> <div> <div>using System;</div> <div>using System.Collections.Generic;</div> <div>using System.Linq;</div> <div>using System.Text;</div> <div>using System.Diagnostics;</div> <div>using System.Threading;</div> <div><br/></div> <div>using Microsoft.Hpc.Scheduler;</div> <div>using Microsoft.Hpc.Scheduler.Properties;</div> <div><br/></div> <div>namespace HpcDependsTest</div> <div>{</div> <div>class Program</div> <div>{</div> <div>static void Main(string[] args)</div> <div>{</div> <div>String sHeadnode = &quot;devhead&quot;;</div> <div>string[] sTaskNamesEmpty = {};</div> <div><br/></div> <div>//</div> <div>//UNIT TEST SET #1</div> <div>//</div> <div>// THIS FAILS</div> <div>//</div> <div><br/></div> <div>//  T         Task 1</div> <div>//  i        /     \</div> <div>//  m       |       |</div> <div>//  e    Task 2   Task 3</div> <div>//          |       |</div> <div>//  |        \     /</div> <div>//  v        Task 4</div> <div><br/></div> <div>string[] sTaskNamesA = {&quot;Task1&quot;};</div> <div>string[] sTaskNamesB = {&quot;Task2&quot;, &quot;Task3&quot;};</div> <div>string[] sTaskNamesC = {&quot;Task4&quot;};</div> <div><br/></div> <div>//</div> <div>//UNIT TEST SET #1</div> <div>//</div> <div><br/></div> <div>//</div> <div>//UNIT TEST SET #2</div> <div>//</div> <div>// THIS PASSES</div> <div>//</div> <div><br/></div> <div>//  T         Task 1</div> <div>//  i           |     </div> <div>//  m           |</div> <div>//  e         Task 2   </div> <div>//              |</div> <div>//  |           |</div> <div>//  v        Task 4</div> <div><br/></div> <div>//string[] sTaskNamesA = { &quot;Task1&quot; };</div> <div>//string[] sTaskNamesB = { &quot;Task2&quot; };</div> <div>//string[] sTaskNamesC = { &quot;Task4&quot; };</div> <div><br/></div> <div>//</div> <div>//UNIT TEST SET #2</div> <div>//</div> <div></div> <div>Stopwatch watch = new Stopwatch();</div> <div>watch.Start();</div> <div>IScheduler scheduler = new Scheduler();</div> <div>scheduler.Connect(sHeadnode);</div> <div>watch.Stop();</div> <div>Console.WriteLine(&quot;Took &quot; + watch.ElapsedMilliseconds + &quot; ms to connect to &quot; + sHeadnode);</div> <div>ISchedulerJob Job = scheduler.CreateJob();</div> <div>Job.Name = &quot;HpcDependsTest&quot;;</div> <div>Job.FailOnTaskFailure = true;</div> <div>Job.IsExclusive = false;</div> <div><br/></div> <div>scheduler.AddJob(Job);</div> <div>Job.Refresh();  //To get Job Id</div> <div>int iJobId = Job.Id;</div> <div>Console.WriteLine(&quot;Job &quot; + iJobId);</div> <div><br/></div> <div>Console.WriteLine(&quot;Adding tasks - Level A&quot;);</div> <div>Addtasks(Job, sTaskNamesA, sTaskNamesEmpty);</div> <div><br/></div> <div>watch.Reset();</div> <div>watch.Start();</div> <div>scheduler.SubmitJobById(iJobId, &quot;USER NAME HERE&quot;, &quot;PASSWORD HERE&quot;);</div> <div>watch.Stop();</div> <div>Console.WriteLine(&quot;Job Submitted &quot; + iJobId + &quot; SubmitJobById took &quot; + watch.ElapsedMilliseconds + &quot; ms.&quot;);</div> <div><br/></div> <div>Console.WriteLine(&quot;Adding tasks - Level B&quot;);</div> <div>Addtasks(Job, sTaskNamesB, sTaskNamesA);</div> <div></div> <div>//</div> <div>//FOR UNIT TEST SET #1</div> <div>//None of these next lines will allow you to add Task4 w/ dependencies to Task2 &amp; Task3</div> <div>//You fail during submit of Task 4 at Job.SubmitTask(task); with this exception</div> <div><br/></div> <div>//Job.Commit();</div> <div>//Job.Refresh();</div> <div>//Job = scheduler.OpenJob(iJobId);</div> <div></div> <div>/*</div> <div>* Invalid task dependency: There is no task with the name Task3.  Check your spelling and try again.</div> <div>*/</div> <div>//FOR UNIT TEST SET #1</div> <div>//</div> <div><br/></div> <div>Console.WriteLine(&quot;Adding tasks - Level C&quot;);</div> <div>Addtasks(Job, sTaskNamesC, sTaskNamesB);</div> <div><br/></div> <div>Console.WriteLine(&quot;End.&quot;);</div> <div>}</div> <div><br/></div> <div>public static void Addtasks(ISchedulerJob Job, string[] taskNames, string[] taskDepsNames)</div> <div>{</div> <div>Stopwatch watch = new Stopwatch();</div> <div>foreach( String sTaskName in taskNames )</div> <div>{</div> <div>ISchedulerTask task = Job.CreateTask();</div> <div>task.CommandLine = &quot;ping -n 5 localhost&quot;;</div> <div>task.Name = sTaskName;</div> <div>task.IsParametric = false;</div> <div>task.IsExclusive = false;</div> <div>task.Runtime = 1440 * 60;</div> <div>task.StdOutFilePath = &quot;NUL&quot;;</div> <div>task.StdErrFilePath = &quot;NUL&quot;;</div> <div><br/></div> <div>foreach( string sTaskDepName in taskDepsNames)</div> <div>{</div> <div>task.DependsOn.Add(sTaskDepName);</div> <div>}</div> <div><br/></div> <div>watch.Reset();</div> <div>watch.Start();</div> <div>Console.WriteLine(&quot;Task Added &quot; + task.TaskId.JobTaskId + &quot; Refresh took &quot; + watch.ElapsedMilliseconds + &quot; ms.&quot;);</div> <div>watch.Stop();</div> <div></div> <div>watch.Reset();</div> <div>watch.Start();</div> <div>Job.SubmitTask(task);</div> <div>watch.Stop();</div> <div>Console.WriteLine(&quot;Task Submitted &quot; + task.TaskId + &quot; Refresh took &quot; + watch.ElapsedMilliseconds + &quot; ms.&quot;);</div> <div>}</div> <div>}</div> <div>}</div> <div>}</div> <div><br/></div> </div> </div> </blockquote> <br/>Thanks for this amazingly detailed response :-)  I believe we've repro'd your issue.  We will be looking into it over the next few days and I'll try to get back to you once we've figured it out.<br/><br/>Most likely it is a bug in adding tasks with dependencies to running jobs.<br/><br/>Thanks!<br/>Josh<br/><hr class="sig">-JoshMon, 11 May 2009 22:55:31 Z2009-05-11T22:55:31Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#cc67acce-6fec-472f-98de-e7577aa84152http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#cc67acce-6fec-472f-98de-e7577aa84152Surlyhttp://social.microsoft.com/Profile/en-US/?user=SurlySQL Tuning to Address Job & Task PerformanceIn case others experience this, we've found that adding the &quot;Error Message&quot; column to the task view can help identify the failed task.  However, I still think identifying tasks as failed vs cancelled would be preferrable. <div><br/></div> <div><br/></div> <div>Josh,</div> <div><br/></div> <div>Any feedback on the other questions:  SQL connection string and adding a task list in one shot?</div> <div><br/></div> <div>Thanks,</div> <div>Luke</div>Tue, 12 May 2009 22:24:31 Z2009-05-12T22:24:31Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#3e69631e-58b1-4ef9-b837-817b01f780b1http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#3e69631e-58b1-4ef9-b837-817b01f780b1Josh Barnardhttp://social.microsoft.com/Profile/en-US/?user=Josh%20BarnardSQL Tuning to Address Job & Task PerformanceRemote SQL databases are unfortunately not supported in v2; we hope to provide support for this in v3.<br/><br/>Adding tasks in a batch isn't supported in v2 either.  We have a fix (coming in SP1) to make it a bit faster to add multiple tasks at once using XML.  This is another thing that we are looking into for v3.<br/><br/>Thanks!<br/>Josh<hr class="sig">-JoshTue, 19 May 2009 21:14:48 Z2009-05-19T21:14:48Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#fdba4361-c7da-4ee1-98e4-4d3529431aachttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#fdba4361-c7da-4ee1-98e4-4d3529431aacMithlanhttp://social.microsoft.com/Profile/en-US/?user=MithlanSQL Tuning to Address Job & Task PerformanceJosh, <div><br/></div> <div>Any news on if this will be in SP1?</div> <div><br/></div> <div>Thanks,</div> <div>Scott</div>Fri, 29 May 2009 18:17:57 Z2009-05-29T18:17:57Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#34b41ced-d965-487c-bbbc-8febc8135a7chttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4e9b05fa-40e0-4c84-ae52-894e82742b97#34b41ced-d965-487c-bbbc-8febc8135a7cJosh Barnardhttp://social.microsoft.com/Profile/en-US/?user=Josh%20BarnardSQL Tuning to Address Job & Task PerformanceThe dependency issue you reported shoudl be fixed in SP1, as should large job XML import.<br/><br/>Thanks!<br/>Josh<hr class="sig">-JoshWed, 03 Jun 2009 21:50:21 Z2009-06-03T21:50:21Z