none
SOA job failing ("Failed" state) - "Access denied by BrokerNodeAuthManager. WindowsIdeneity is not recognized"

    Question

  • I am on 2016 Pack Update 1 (v 5.1.6086 SDK). Trying to move my project from 2012 Pack R2 to 2016 update 1.

    I have a SOA job (with 1 request); Broker and HeadNode are on the same machine; 1 Workstation node on the cluster.

    The moment I submit a job, it goes into "Failed" state almost instantaneously. 

    Here is that task output:

    Error from node: <WORKSTATIONNODE> :System.ServiceModel.ProtocolException: This channel can no longer be used to send messages as the output session was auto-closed due to a server-initiated shutdown. Either disable auto-close by setting the DispatchRuntime.AutomaticInputSessionShutdown to false, or consider modifying the shutdown protocol with the remote server.

    Server stack trace: 
       at System.ServiceModel.Channels.ServiceChannel.PrepareCall(ProxyOperationRuntime operation, Boolean oneway, ProxyRpc& rpc)
       at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)
       at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
       at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)

    Exception rethrown at [0]: 
       at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
       at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
       at Microsoft.Hpc.Activation.INodeManagerService.StartJobAndTask(Int32 jobId, String userAccount, Byte[] cipherText, Byte[] iv, Int32 taskId, ProcessStartInfo startInfo)
       at Microsoft.Hpc.Scheduler.Communicator.Remoting.NodeManagerServiceProxy.<>c__DisplayClass7_0.<StartJobAndTaskAsync>b__0(INodeManagerService c)
       at Microsoft.Hpc.WcfReliableClient`1.<>c__DisplayClass7_0.<InvokeOperationWithRetryAsync>b__0(T t)
       at Microsoft.Hpc.WcfReliableClient`1.<InvokeOperationWithRetryAsync>d__9`1.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at Microsoft.Hpc.WcfReliableClient`1.<InvokeOperationWithRetryAsync>d__9`1.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.Scheduler.Communicator.Remoting.NodeManagerServiceProxy.<InvokeOperationWithRetryAsync>d__2`1.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.WcfReliableClient`1.<InvokeOperationWithRetryAsync>d__7.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.WcfReliableClient`1.<InvokeOperationWithRetryAsync>d__6.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.Scheduler.Communicator.Remoting.NodeManagerServiceProxy.<StartJobAndTaskAsync>d__7.MoveNext()

    The Message Details has following Trace text:

    "[HpcServiceHost]: Response is sent back. Is Fault = True"

    Tried enabling the trace in my configuration file for this service, by copying the listeners taken from ccpechoservice.config : and I get an exception

    The type initializer for 'Microsoft.Hpc.RuntimeTrace.TraceHelper' threw an exception.
    HpcSoa Error: 13 : [Session:62] System.TypeInitializationException: The type initializer for 'Microsoft.Hpc.RuntimeTrace.TraceHelper' threw an exception. ---> System.Configuration.ConfigurationErrorsException: Couldn't find type for class Microsoft.Hpc.Trace.HpcTraceListener, Microsoft.Hpc.Trace.
       at System.Diagnostics.TraceUtils.GetRuntimeObject(String className, Type baseType, String initializeData)
       at System.Diagnostics.TypedElement.BaseGetRuntimeObject()
       at System.Diagnostics.ListenerElement.GetRuntimeObject()
       at System.Diagnostics.ListenerElement.GetRuntimeObject()
       at System.Diagnostics.ListenerElementsCollection.GetRuntimeObject()
       at System.Diagnostics.TraceSource.Initialize()
       at Microsoft.Hpc.RuntimeTrace.RuntimeTraceWrapper..ctor()
       at Microsoft.Hpc.RuntimeTrace.TraceHelper..cctor()
       --- End of inner exception stack trace ---
       at Microsoft.Hpc.RuntimeTrace.TraceHelper.get_RuntimeTrace()
       at Microsoft.Hpc.CcpServiceHosting.CcpServiceHostWrapper.Dispose()
       at System.IDisposable.Dispose()
       at Microsoft.Hpc.CcpServiceHosting.Program.Main(String[] args)
    HpcSoa Information: 11 : [Session:62] Open dummy service...
    HpcSoa Information: 1002 : Servicehost is started.
    HpcSoa Verbose: 10 : [Session:62] [HpcServiceHost]: Task Id = 380
    HpcSoa Verbose: 10 : [Session:62] [HpcServiceHost]: Number of processors (service capability) = 1
    HpcSoa Information: 11 : [Session:62] [HpcServiceHost]: Cancel Task Grace Period = 15000
    HpcSoa Information: 11 : [Session:62] [HpcServiceHost]: First Allocated CoreId = 7
    HpcSoa Information: 11 : [Session:62] [HpcServiceHost]: EnableMessageLevelPreemption = True
    HpcSoa Error: 13 : [Session:62] [HpcServiceHost]: Cannot find service registration file.
    HpcSoa Verbose: 10 : [Session:62] [HpcServiceHost]: WCF network prefix is not set.
    HpcSoa Verbose: 10 : [Session:62] [HpcServiceHost]: ServiceOperationTimeout = 86400000, MaxMessageSize = 655360
    HpcSoa Information: 11 : [Session:62] [HpcServiceHost]: BrokerNodeAuthManager initialized. AllowerUser = domain\user, JobOwner = 
    HpcSoa Information: 11 : [Session:62] defaultBaseAddr of HostController is net.tcp://<>:9107/62/380
    HpcSoa Information: 11 : [Session:62] Created ServiceHost for controller.
    HpcSoa Information: 11 : [Session:62] Added endpoint to controller.
    HpcSoa Information: 11 : [Session:62] [HpcServiceHost]: BrokerNodeAuthManager initialized. AllowerUser = domain\user, JobOwner = 
    HpcSoa Information: 11 : [Session:62] Try to call _hostController.Open() below.
    HpcSoa Information: 11 : [Session:62] Controller opened.
    HpcSoa Verbose: 10 : [Session:62] [HpcServiceHost]: Dummy service opened on net.tcp://lco-td-1hbwjh2:9107/62/380/_defaultEndpoint
    HpcSoa Warning: 12 : [Session:62] [HpcServiceHost]: Access denied by BrokerNodeAuthManager. WindowsIdeneity is not recognized.
    HpcSoa Verbose: 10010 : [HpcServiceHost]: Response 00000000-0000-0000-0000-000000000000 is sent back. IsFault = True


    Lastly, ccpechosvc seems to work just fine.



    • Edited by SRIRAM R Wednesday, April 4, 2018 6:50 PM
    Wednesday, April 4, 2018 5:26 PM

All replies

  • Hi,

    If you copy the listener from ccpechoservice, you'll have to copy CosmosLogging.dll, CosmosLoggingManaged.dll, Microsoft.Hpc.Trace.dll and probably Microsoft.Hpc.TraceCore.dll from %CCP_HOME%bin folder to the same folder of you SOA service assemblies in order to use HpcTraceListener.

    After you enabled HPC Service Host log, please help us collect HPC broker worker and HPC Service Host logs if this happens again. Also please tell us about UTC time of the repro. The logs are at %CCP_LOGROOT_SYS%SOA\HpcBrokerWorker_*.bin on the head node and %CCP_LOGROOT_USR%SOA\HpcServiceHost\%CCP_JOBID%\%CCP_TASKINSTANCEID%\Host_*.bin on the compute nodes. Please collect more than 4 latest logs to ensure they contain the repro. You can check if the logs you collected contain information in need using LogViewer. You can download it from https://hpconlineservice.blob.core.windows.net/logviewer/LogViewer.UI.application.

    Thanks,
    Zihao

    Monday, April 9, 2018 3:04 AM