none
hpc 2016 scheduler error (HpcSchedulerStateful.exe service failed) RRS feed

  • Question

  • Hi

    There is a problem with the scheduler while using HPC, and the contents are being checked.
    This is the first error. There is no content anywhere, such as Google search or blog. Help

    using hpc pack 2016 version (5.3.6450).
    Scheduler Connect Error.
     Identifying the cause of the error for scheduler connect error

    event log


    HpcSchedulerStateful.exe
    Framework version: v4.0.30319
    Description: The process was terminated due to an unhandled exception:System.OutOfMemoryException

    ~~~

    HPCScheduler.bin Log

    05/26/2020 06:24:00.136 w HpcSchedulerStateful.exe 8244 16744 [ExceptionWrapperErrorHandler] The exception is: System.InvalidOperationException:    .  : 18..   : System.Data.SqlClient.TdsParserStateObject.TryProcessHeader()..   : System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer()..   : System.Data.SqlClient.TdsParserStateObject.TryReadByteArray(Byte[] buff, Int32 offset, Int32 len, Int32& totalRead)..   : System.Data.SqlClient.TdsParser.TrySkipPlpValue(UInt64 cb, TdsParserStateObject stateObj, UInt64& totalBytesSkipped)..   : System.Data.SqlClient.TdsParser.TrySkipValue(SqlMetaDataPriv md, Int32 columnOrdinal, TdsParserStateObject stateObj)..   : System.Data.SqlClient.TdsParser.TrySkipRow(_SqlMetaDataSet columns, Int32 startCol, TdsParserStateObject stateObj)..   : System.Data.SqlClient.SqlDataReader.TryCleanPartialRead()..   : System.Data.SqlClient.SqlDataReader.TryCloseInternal(Boolean closeReader)..   : System.Data.SqlClient.SqlDataReader.Close()..   : System.Data.Common.DbDataReader.Dispose(Boolean disposing)..   : Microsoft.Hpc.Scheduler.Store.MultiTableQuery.Close()..   : Microsoft.Hpc.Scheduler.Store.MultiTableQuery.ExecuteRowSetRead(IEnumerable`1 ids)..   : Microsoft.Hpc.Scheduler.Store.QueryContextBase.ExecuteRowSetQuery(List`1 ids, PropertyId[] pids)..   : Microsoft.Hpc.Scheduler.Store.DynamicRowSet.GetData(Int32 firstRow, Int32 lastRow, Int32& rowCount)..   : Microsoft.Hpc.Scheduler.Store.SchedulerStoreInternal.RowSet_GetDataWithWindowBoundary(ConnectionToken& token, Int32 rowsetId, Int32 firstRow, Int32 lastRow)..   : SyncInvokeRowSet_GetDataWithWindowBoundary(Object , Object[] , Object[] )..   : System.ServiceModel.Dispatcher.SyncMethodInvoker.Invoke(Object instance, Object[] inputs, Object[]& outputs)..   : System.ServiceModel.Dispatcher.DispatchOperationRuntime.InvokeBegin(MessageRpc& rpc)..   : System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage5(MessageRpc& rpc)..   : System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage11(MessageRpc& rpc)..   : System.ServiceModel.Dispatcher.MessageRpc.Process(Boolean isOperationContextSet)  

    ~~

    Physical memory is sufficient.


    Thursday, June 4, 2020 2:14 AM

All replies

  • Hi jmpark,

    It looks the OOM occurred when loading large data set from the scheduler database. Could you add the following config in yellow in the HpcSchedulerStateful.exe.config file under the Service Fabric folder (e.g. C:\ProgramData\SF\_Node_0\Fabric\work\Applications\HpcApplicationType_App0\SchedulerStatefulServicePkg.Code.1.0.0\HpcSchedulerStateful.exe.config) on all the head nodes, and then do a failover.

      <runtime>

        <gcServer enabled="true" />

       <gcAllowVeryLargeObjects enabled="true" />

    Let me know if this works.

    Regards,

    Yutong Sun

    Tuesday, June 9, 2020 3:41 PM