locked
Jobs failing to run on HPC Pack 2016 RRS feed

  • Question

  • When attempting to run a job on one of my clusters compute nodes I keep getting the following error

    The job encountered and error: "Job failed to start on some nodes or some nodes are unreachable"

    Error from node: WINDY-CN-01:System.Runtime.Serialization.SerializationException: Unable to find assembly 'Microsoft.Hpc.NodeManager.RemotingExecutor, Version=5.0.0.0, Culture=neutral, PublicKeyToken=null'.
       at System.Runtime.Serialization.Formatters.Binary.BinaryAssemblyInfo.GetAssembly()
       at System.Runtime.Serialization.Formatters.Binary.ObjectReader.GetType(BinaryAssemblyInfo assemblyInfo, String name)
       at System.Runtime.Serialization.Formatters.Binary.ObjectMap..ctor(String objectName, String[] memberNames, BinaryTypeEnum[] binaryTypeEnumA, Object[] typeInformationA, Int32[] memberAssemIds, ObjectReader objectReader, Int32 objectId, BinaryAssemblyInfo assemblyInfo, SizedArray assemIdToAssemblyTable)
       at System.Runtime.Serialization.Formatters.Binary.__BinaryParser.ReadObjectWithMapTyped(BinaryObjectWithMapTyped record)
       at System.Runtime.Serialization.Formatters.Binary.__BinaryParser.Run()
       at System.Runtime.Serialization.Formatters.Binary.ObjectReader.Deserialize(HeaderHandler handler, __BinaryParser serParser, Boolean fCheck, Boolean isCrossAppDomain, IMethodCallMessage methodCallMessage)
       at System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.Deserialize(Stream serializationStream, HeaderHandler handler, Boolean fCheck, Boolean isCrossAppDomain, IMethodCallMessage methodCallMessage)
       at Microsoft.Hpc.ExceptionWrapper.DeserializeException()
       at Microsoft.Hpc.Scheduler.Communicator.Remoting.NodeManagerServiceProxy.<InvokeOperationWithRetryAsync>d__2`1.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.WcfReliableClient`1.<InvokeOperationWithRetryAsync>d__7.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.WcfReliableClient`1.<InvokeOperationWithRetryAsync>d__6.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.Scheduler.Communicator.Remoting.NodeManagerServiceProxy.<StartJobAndTaskAsync>d__7.MoveNext()

    Has anyone else bumped into an issue like this?

    Friday, January 26, 2018 12:27 PM

Answers

  • Please migrate your cluster to HPC Pack 2016 Update 1. the migration doc is here: https://technet.microsoft.com/en-us/library/mt829314(v=ws.11).aspx and update 1 is available here: HPC Pack 2016 Update 1 here.


    Qiufang Shi

    • Marked as answer by wkerr128 Monday, February 5, 2018 11:48 AM
    Tuesday, January 30, 2018 5:35 AM

All replies