Answered by:
HPC scheduler receiving timeouts on Refresh() calls

Question
-
Hi all,
We are having an intermittent issue with the HPC scheduler whereby a Refresh() call is timing out with the error:
Microsoft.Hpc.Scheduler.Properties.SchedulerException: Database exception:Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
When this error occurs the compute nodes are flat out at 100% but only happens after several hours. There is nothing else running on the grid all the interface is singlethreaded and not doing much more than making Refresh() calls periodically.
Here are the relevant log details:
The event log shows:
Log Name: Windows HPC Server
Source: Microsoft-Windows-HPCServer
Date: 7/27/2010 8:13:33 AM
Event ID: 24
Task Category: None
Level: Error
Keywords:
User: SYSTEM
Description:
The scheduler got a SQL exception.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-HPCServer" Guid="{5b169e40-a3c7-4419-a919-87cd93f2964d}" />
<EventID>24</EventID>
<Version>0</Version>
<Level>2</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x2000000000000000</Keywords>
<TimeCreated SystemTime="2010-07-27T02:43:33.922Z" />
<EventRecordID>17170</EventRecordID>
<Correlation />
<Execution ProcessID="2632" ThreadID="2916" />
<Channel>Windows HPC Server</Channel>
<Computer>sts091.nousblr-odc.local</Computer>
<Security UserID="S-1-5-18" />
</System>
<EventData>
<Data Name="SQLQuery">SELECT Tasks_Main2.ID
,Jobs.UnitType
,Tasks_Main2.InstanceValue
,Tasks_Settings2.CommandLine
,Tasks_Settings2.Runtime
,Tasks_Settings2.MinCores
,Tasks_Settings2.MaxCores
,Tasks_Settings2.MinNodes
,Tasks_Settings2.MaxNodes
,Tasks_Settings2.MinSockets
,Tasks_Settings2.MaxSockets
,Tasks_Settings2.IsRerunnable
,Tasks_Main2.RequeueCount
,Tasks_Settings2.DependsOnTasks
,Tasks_Settings2.RequiredNodes
,Tasks_Main2.ParentJobID
,Tasks_Settings2.IsExclusive
,Tasks_Settings2.NiceId
,Tasks_Main2.State
,Tasks_Main2.InstanceId
,ParametricTaskCounters.Canceled
,ParametricTaskCounters.Failed
,ParametricTaskCounters.Running
,ParametricTaskCounters.Queued
,Jobs.State
,Tasks_Settings2.Name
,Tasks_Settings2.GroupId
,Tasks_Settings2.IsParametric
,Tasks_Settings2.StartValue
,Tasks_Settings2.EndValue
,Tasks_Settings2.IncrementValue
FROM Jobs
INNER JOIN Tasks_Main2 ON Tasks_Main2.ParentJobID=Jobs.ID
INNER JOIN Tasks_Settings2 ON Tasks_Settings2.RecordId=Tasks_Main2.RecordId
INNER JOIN ParametricTaskCounters ON ParametricTaskCounters.RecordId=Tasks_Settings2.RecordId
WHERE Tasks_Main2.InstanceId<=@param0 AND Tasks_Main2.State=@param1 AND Jobs.State>=@param2 AND Jobs.State<=@param3 ORDER BY Tasks_Main2.ParentJobID ASC
-- 570375073
</Data>
<Data Name="Exception">System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
at System.Data.SqlClient.SqlDataReader.ConsumeMetaData()
at System.Data.SqlClient.SqlDataReader.get_MetaData()
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader()
at Microsoft.Hpc.Scheduler.Store.StoreSqlCommand.ExecuteReader()</Data>
</EventData>
</Event>
The SQL logs show the following:
'SELECT Tasks_Main2.ParentJobID<nl/><c/>Tasks_Main2.ID<nl/><c/>Tasks_Main2.RequestCancel<nl/><c/>Tasks_Main2.State<nl/><c/>Tasks_Main2.InstanceId<nl/><c/>ParametricTaskCounters.Canceled<nl/><c/>ParametricTaskCounters.Failed<nl/><c/>ParametricTaskCounters.Running<nl/><c/>ParametricTaskCounters.Queued<nl/><nl/>FROM Jobs<nl/>INNER JOIN Tasks_Main2 ON Tasks_Main2.ParentJobID=Jobs.ID<nl/>INNER JOIN Tasks_Settings2 ON Tasks_Settings2.RecordId=Tasks_Main2.RecordId<nl/>INNER JOIN ParametricTaskCounters ON ParametricTaskCounters.RecordId=Tasks_Settings2.RecordId<nl/><nl/>WHERE Tasks_Main2.InstanceId>=@param0 AND Tasks_Main2.RequestCancel<>@param1 AND Tasks_Main2.State<>@param2 AND Jobs.State=@param3<nl/>-- 761869105<nl/>'<c/> 'System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.<nl/> at System.Data.SqlClient.SqlConnection.OnError(SqlException exception<c/> Boolean breakConnection)<nl/> at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)<nl/> at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior<c/> SqlCommand cmdHandler<c/> SqlDataReader dataStream<c/> BulkCopySimpleResultSet bulkCopyHandler<c/> TdsParserStateObject stateObj)<nl/> at System.Data.SqlClient.SqlDataReader.ConsumeMetaData()<nl/> at System.Data.SqlClient.SqlDataReader.get_MetaData()<nl/> at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds<c/> RunBehavior runBehavior<c/> String resetOptionsString)<nl/> at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior<c/> RunBehavior runBehavior<c/> Boolean returnStream<c/> Boolean async)<nl/> at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior<c/> RunBehavior runBehavior<c/> Boolean returnStream<c/> String method<c/> DbAsyncResult result)<nl/> at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior<c/> RunBehavior runBehavior<c/> Boolean returnStream<c/> String method)<nl/> at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior<c/> String method)<nl/> at System.Data.SqlClient.SqlCommand.ExecuteReader()<nl/> at Microsoft.Hpc.Scheduler.Store.StoreSqlCommand.ExecuteReader()',(0),24,,sts091.nousblr-odc.local
Wednesday, July 28, 2010 8:32 AM
Answers
-
Have you resolved your issue?
- Marked as answer by Don PatteeModerator Friday, February 4, 2011 7:36 PM
Wednesday, January 12, 2011 2:39 AMModerator
All replies
-
Is this using HPC Server 2008 R2 ?Tuesday, November 2, 2010 9:02 PM
-
Have you resolved your issue?
- Marked as answer by Don PatteeModerator Friday, February 4, 2011 7:36 PM
Wednesday, January 12, 2011 2:39 AMModerator