Answered Compute Node unreachable

  • Monday, March 19, 2012 3:57 PM
     
     

    My Compute Node keeps unreachable even after checking all points in this Technet Post:

    http://technet.microsoft.com/en-us/library/ee523229%28WS.10%29.aspx

    Everything works: ping from both sides, Fileshare-connections, ping to domain conroller, RDP directly from Cluster Manager.....
    But the Compute Nodes Keeps unreachable.

    Where is the Problem in my Case?
    FYI: my Configuration:
    Topology: Compute Nodes isolated on a private Network
    The Infrastructure is hosted by an external Provider.

    BR

All Replies

  • Tuesday, March 20, 2012 3:29 AM
     
     

    Hello.

    Did you apply node template to these nodes?

  • Tuesday, March 20, 2012 6:32 AM
     
     

    Hi,

    Sure I have assigned a Node Template (a very simple one),.. even when I take the node Offline and reassign it, the node stille keeps unreachable....

  • Tuesday, March 27, 2012 11:47 PM
     
     

    Could you go to one fo the CNs and make sure "HPCNodeManagerService" is running? It would also be helpful to check out the event logs on the CNs.

    Michael

  • Thursday, March 29, 2012 9:40 AM
     
     

    HI,

    HPCNodeManagerService is running, and there are no error or warnings in the Event Log of the CN.

    BR

    Peter

  • Thursday, March 29, 2012 11:44 AM
     
     

    I found something:

    the Host name of the CN ist written in the hosts-file, and not listed in the DNS.
    So ping to the Server is working, but logically nslookup fail.

    Can that be a Problem for HPC?

    Peter

  • Tuesday, April 03, 2012 12:13 AM
     
     

    Hi Peter,

    It doesn't look like the problem with HPC, you should work towards getting your DNS server fixed. If DNS is not working, CNs will go unreachable.

    Michael

    • Proposed As Answer by Michael_Man Tuesday, April 03, 2012 12:13 AM
    • Unproposed As Answer by Peter Schaunitzer Wednesday, May 30, 2012 1:35 PM
    •  
  • Tuesday, May 29, 2012 10:51 AM
     
     

    Hi Michael,

    Now i have this Problem with all of my Compute Nodes. Do they all have to be an entry in the DNS?

    The configuration in ../etc/hosts is not enough?

    Because I can Ping the node and also Telnet from the Head Node to the Compute Node for the Node Managment Service is successfully (Port 1856)..

    BR

    Peter

  • Tuesday, May 29, 2012 8:22 PM
     
     

    Hi Peter,

    I'd say if you have the same entry in ../etc/hosts on the HN, you hsould be good to go.

    Michael

  • Wednesday, May 30, 2012 5:28 AM
     
     

    Hi,

    Yes, the Entry is also defined in the hosts-file of the Head node.

    But still: "Compute Node unreachable"....

    Everything is working fine:

    • Telnet from HN to CN on Port 1856 ==> OK
    • RDP from HN to CN ==> OK
    • Ping from HN to CN ==> OK
    • Ping from CN to HN ==> OK
    • Assign-Node Template to CN ==> OK

    So I don't understand why there is the Error-Message "HPC Node Manager Service unreachable".

    And thats the Problem on all CN:

  • Thursday, May 31, 2012 4:24 PM
     
     

    Hi Peter,

    Did you check your firewall? Could you try disabling firewalls on both sides?

    Michael

  • Friday, June 01, 2012 5:15 AM
     
     

    Hi Michael,

    The firewall is disabled on both sides. And when i take a look at Tcpview or netstat, there are many open

    connections between cn and hn...

    BT

  • Friday, June 01, 2012 5:11 PM
     
     

    And you have the latest HPC pack installed right?

    Please email me at micman@microsoft.com so we can further discuss your problem, it's definitely very interesting to look at...

    Michael

  • Monday, June 04, 2012 8:46 AM
     
     

    Hi,

    Have you guys set out the problem? I have exactly the same problem, and do not know how to solve it.

    Any help is appreciated.

    Thanks,

    zhigao

  • Monday, June 04, 2012 5:00 PM
     
     

    hi Zhigao,

    Not yet, did you try the same steps Peter did? Which version of HPC Pack are you running?

    Michael

  • Tuesday, June 05, 2012 2:38 AM
     
     

    Hi Michael,

    I have checked everything listed above. My versioin is 2008 R2. I tried disable firewall on both sides and not working also. I am using topology 1, so now the HN firewall on and the CN firewall off. And i did not activate the license for both HN and CN. Does this related to some group policy problem or firewall rules? I am not quite sure about this part.

    I found one error from the Event Viewer-> Applications and services logs->Microsoft->HPC->Scheduler->operational->Node *** is unreachable because no IPV4 address could be found for it. But it has a internal IPV4 address 192.168.1.3 which is shown on the Node->Network tab. The details are below:

    //===========================================================================

    + <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
    - <System>
      <Provider Name="Microsoft-HPC-Scheduler" Guid="{5B169E40-A3C7-4419-A919-87CD93F2964D}" />
      <EventID>8</EventID>
      <Version>0</Version>
      <Level>2</Level>
      <Task>0</Task>
      <Opcode>0</Opcode>
      <Keywords>0x8000000000000000</Keywords>
      <TimeCreated SystemTime="2012-06-05T03:59:09.630972500Z" />
      <EventRecordID>2967</EventRecordID>
      <Correlation />
      <Execution ProcessID="1680" ThreadID="860" /> a
      <Channel>Microsoft-HPC-Scheduler/Operational</Channel>
      <Computer>WindowsHPCHead.WindowsHPCDomain.com</Computer>
      <Security UserID="S-1-5-18" />
      </System>
    - <EventData>
      <Data Name="Message">Node WINDOWSHPCN1001.WindowsHPCDomain.com is unreachable because no IPV4 address could be found for it.</Data>
      <Data Name="ExceptionString">Exception detail: Microsoft.Hpc.Scheduler.Properties.SchedulerException: Node WINDOWSHPCN1001.WindowsHPCDomain.com is unreachable because no IPV4 address could be found for it. at Microsoft.Hpc.Scheduler.Communicator.Remoting.NodeController.Resolve(String name) at Microsoft.Hpc.Scheduler.Communicator.Remoting.NodeController.CreateNewNodeManager(String nodeName) at Microsoft.Hpc.Scheduler.Communicator.Remoting.NodeController.PingUnreachableNode(String nodeName, PingArg arg, NodeCommunicatorCallBack`1 callback) Current stack: at Microsoft.Hpc.Scheduler.SchedulerTracing.TraceException(String facility, Exception exception) at Microsoft.Hpc.Scheduler.Communicator.Remoting.NodeController.PingUnreachableNode(String nodeName, PingArg arg, NodeCommunicatorCallBack`1 callback) at Microsoft.Hpc.Scheduler.Communicator.Remoting.NodeListener.ReportNodeInformationEx(String nodeName, ComputeClusterNodeInformation nodeInfo, String& logicalName) at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Int32 methodPtr, Boolean fExecuteInContext, Object[]& outArgs) at System.Runtime.Remoting.Messaging.StackBuilderSink.SyncProcessMessage(IMessage msg, Int32 methodPtr, Boolean fExecuteInContext) at System.Runtime.Remoting.Messaging.ServerObjectTerminatorSink.SyncProcessMessage(IMessage reqMsg) at System.Runtime.Remoting.Messaging.ServerContextTerminatorSink.SyncProcessMessage(IMessage reqMsg) at System.Runtime.Remoting.Channels.CrossContextChannel.SyncProcessMessageCallback(Object[] args) at System.Runtime.Remoting.Channels.ChannelServices.DispatchMessage(IServerChannelSinkStack sinkStack, IMessage msg, IMessage& replyMsg) at System.Runtime.Remoting.Channels.BinaryServerFormatterSink.ProcessMessage(IServerChannelSinkStack sinkStack, IMessage requestMsg, ITransportHeaders requestHeaders, Stream requestStream, IMessage& responseMsg, ITransportHeaders& responseHeaders, Stream& responseStream) at System.Runtime.Remoting.Channels.Tcp.TcpServerTransportSink.ServiceRequest(Object state) at System.Runtime.Remoting.Channels.SocketHandler.ProcessRequestNow() at System.Runtime.Remoting.Channels.SocketHandler.BeginReadMessageCallback(IAsyncResult ar) at System.Net.LazyAsyncResult.Complete(IntPtr userToken) at System.Net.LazyAsyncResult.ProtectedInvokeCallback(Object result, IntPtr userToken) at System.Net.Security.NegotiateStream.ProcessFrameBody(Int32 readBytes, Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest) at System.Net.Security.NegotiateStream.ReadCallback(AsyncProtocolRequest asyncRequest) at System.Net.FixedSizeReader.CheckCompletionBeforeNextRead(Int32 bytes) at System.Net.FixedSizeReader.ReadCallback(IAsyncResult transportResult) at System.Net.LazyAsyncResult.Complete(IntPtr userToken) at System.Threading.ExecutionContext.runTryCode(Object userData) at System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(TryCode code, CleanupCode backoutCode, Object userData) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Net.ContextAwareResult.Complete(IntPtr userToken) at System.Net.LazyAsyncResult.ProtectedInvokeCallback(Object result, IntPtr userToken) at System.Net.Sockets.BaseOverlappedAsyncResult.CompletionPortCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* nativeOverlapped) at System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pOVERLAP)</Data>
      </EventData>
      </Event>

    Thanks & Regards,

    zhigao






    • Edited by zgcheng Tuesday, June 05, 2012 4:05 AM
    •  
  • Tuesday, June 05, 2012 6:23 AM
     
     Answered

    Hi,

    Thanks to Michael_Man:

    SP3 for HPC Pack 2008 R2 has solved my problem:

    http://www.microsoft.com/en-us/download/details.aspx?id=28017

    Now everything works fine.

    BR

    Peter

  • Tuesday, June 05, 2012 8:21 AM
     
     

    Hi,

    You are right. Update to SP3 solve the problem. Thanks so much.

    Thanks & Regards,

    zhiga