none
Exception during clusrun RRS feed

  • Question

  • I see intermittent frequent exceptions when I use clusrun:

    C:\Users\strogansadmin>clusrun.exe /nodegroup:ComputeNodes dir D:\temp\
    Microsoft.Hpc.Scheduler.Properties.SchedulerException: Failed to load instance id 34648163-3a49-43d7-8f53-b0e4aa0a0187 in change c6c83642-c6cf-48a7-9fd5-0c3986d5cde0, revision 25 from the store.  ---> System.ServiceModel.FaultException`1[System.ServiceModel.ExceptionDetail]: Failed to load instance id 34648163-3a49-43d7-8f53-b0e4aa0a0187 in change c6c83642-c6cf-48a7-9fd5-0c3986d5cde0, revision 25 from the store.

    Server stack trace:
       at System.ServiceModel.Channels.ServiceChannel.ThrowIfFaultUnderstood(Message reply, MessageFault fault, String action, MessageVersion version, FaultConverter faultConverter)
       at System.ServiceModel.Channels.ServiceChannel.HandleReply(ProxyOperationRuntime operation, ProxyRpc& rpc)
       at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)
       at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
       at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)

    Exception rethrown at [0]:
       at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
       at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
       at Microsoft.Hpc.Scheduler.NodeManagement.INodeQuery.EnumerateNodesInNodeGroup(String groupName)
       at Microsoft.Hpc.WcfReliableClient`1.<InvokeOperationWithRetryAsync>d__9`1.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.WcfReliableClient`1.<InvokeOperationWithRetryAsync>d__8`1.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.Scheduler.NodeManagement.NodeQuery.InvokeOperation[T](Func`2 operation)
       --- End of inner exception stack trace ---
       at Microsoft.Hpc.Scheduler.NodeManagement.NodeQuery.InvokeOperation[T](Func`2 operation)
       at CliTools.NodeSelection.GetNodeListByNodeGroup(String headnode, String nodeGroup)
       at CliTools.NodeSelection.Resolve(IScheduler scheduler, ISchedulerStore store, Program owner)
       at CliTools.ClusRun.Execute(List`1 args)
       at CliTools.CommandVerbList.Execute(List`1 args)
       at CliTools.Program.RunList(CommandVerbList list, String[] args)

    Exceptions are especially frequent when I use /nodeGroup option. What's wrong? How to fix it?

    Monday, September 16, 2019 9:23 AM

All replies

  • Hi sergueis,

    Which HPC Pack version are you running on? (HPC Cluster Manager -> Help -> About)

    The issue looks an error when querying nodes from a node group. You may try to simply repro it by running 'node list /group:ComputeNodes'. We may want to check the scheduler and management service logs to further investigate it.

    Regards,

    Yutong Sun

    Sunday, September 22, 2019 8:03 AM
    Moderator