none
Compute Node unreachable RRS feed

  • Question

  • My Compute Node keeps unreachable even after checking all points in this Technet Post:

    http://technet.microsoft.com/en-us/library/ee523229%28WS.10%29.aspx

    Everything works: ping from both sides, Fileshare-connections, ping to domain conroller, RDP directly from Cluster Manager.....
    But the Compute Nodes Keeps unreachable.

    Where is the Problem in my Case?
    FYI: my Configuration:
    Topology: Compute Nodes isolated on a private Network
    The Infrastructure is hosted by an external Provider.

    BR

    Monday, March 19, 2012 3:57 PM

Answers

  • Hi,

    Thanks to Michael_Man:

    SP3 for HPC Pack 2008 R2 has solved my problem:

    http://www.microsoft.com/en-us/download/details.aspx?id=28017

    Now everything works fine.

    BR

    Peter

    Tuesday, June 5, 2012 6:23 AM

All replies

  • Hello.

    Did you apply node template to these nodes?

    Tuesday, March 20, 2012 3:29 AM
  • Hi,

    Sure I have assigned a Node Template (a very simple one),.. even when I take the node Offline and reassign it, the node stille keeps unreachable....

    Tuesday, March 20, 2012 6:32 AM
  • Could you go to one fo the CNs and make sure "HPCNodeManagerService" is running? It would also be helpful to check out the event logs on the CNs.

    Michael

    Tuesday, March 27, 2012 11:47 PM
  • HI,

    HPCNodeManagerService is running, and there are no error or warnings in the Event Log of the CN.

    BR

    Peter

    Thursday, March 29, 2012 9:40 AM
  • I found something:

    the Host name of the CN ist written in the hosts-file, and not listed in the DNS.
    So ping to the Server is working, but logically nslookup fail.

    Can that be a Problem for HPC?

    Peter

    Thursday, March 29, 2012 11:44 AM
  • Hi Peter,

    It doesn't look like the problem with HPC, you should work towards getting your DNS server fixed. If DNS is not working, CNs will go unreachable.

    Michael

    • Proposed as answer by Michael_Man Tuesday, April 3, 2012 12:13 AM
    • Unproposed as answer by Peter Schaunitzer Wednesday, May 30, 2012 1:35 PM
    Tuesday, April 3, 2012 12:13 AM
  • Hi Michael,

    Now i have this Problem with all of my Compute Nodes. Do they all have to be an entry in the DNS?

    The configuration in ../etc/hosts is not enough?

    Because I can Ping the node and also Telnet from the Head Node to the Compute Node for the Node Managment Service is successfully (Port 1856)..

    BR

    Peter

    Tuesday, May 29, 2012 10:51 AM
  • Hi Peter,

    I'd say if you have the same entry in ../etc/hosts on the HN, you hsould be good to go.

    Michael

    Tuesday, May 29, 2012 8:22 PM
  • Hi,

    Yes, the Entry is also defined in the hosts-file of the Head node.

    But still: "Compute Node unreachable"....

    Everything is working fine:

    • Telnet from HN to CN on Port 1856 ==> OK
    • RDP from HN to CN ==> OK
    • Ping from HN to CN ==> OK
    • Ping from CN to HN ==> OK
    • Assign-Node Template to CN ==> OK

    So I don't understand why there is the Error-Message "HPC Node Manager Service unreachable".

    And thats the Problem on all CN:

    Wednesday, May 30, 2012 5:28 AM
  • Hi Peter,

    Did you check your firewall? Could you try disabling firewalls on both sides?

    Michael

    Thursday, May 31, 2012 4:24 PM
  • Hi Michael,

    The firewall is disabled on both sides. And when i take a look at Tcpview or netstat, there are many open

    connections between cn and hn...

    BT

    Friday, June 1, 2012 5:15 AM
  • And you have the latest HPC pack installed right?

    Please email me at micman@microsoft.com so we can further discuss your problem, it's definitely very interesting to look at...

    Michael

    Friday, June 1, 2012 5:11 PM
  • Hi,

    Have you guys set out the problem? I have exactly the same problem, and do not know how to solve it.

    Any help is appreciated.

    Thanks,

    zhigao

    Monday, June 4, 2012 8:46 AM
  • hi Zhigao,

    Not yet, did you try the same steps Peter did? Which version of HPC Pack are you running?

    Michael

    Monday, June 4, 2012 5:00 PM
  • Hi Michael,

    I have checked everything listed above. My versioin is 2008 R2. I tried disable firewall on both sides and not working also. I am using topology 1, so now the HN firewall on and the CN firewall off. And i did not activate the license for both HN and CN. Does this related to some group policy problem or firewall rules? I am not quite sure about this part.

    I found one error from the Event Viewer-> Applications and services logs->Microsoft->HPC->Scheduler->operational->Node *** is unreachable because no IPV4 address could be found for it. But it has a internal IPV4 address 192.168.1.3 which is shown on the Node->Network tab. The details are below:

    //===========================================================================

    + <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
    - <System>
      <Provider Name="Microsoft-HPC-Scheduler" Guid="{5B169E40-A3C7-4419-A919-87CD93F2964D}" />
      <EventID>8</EventID>
      <Version>0</Version>
      <Level>2</Level>
      <Task>0</Task>
      <Opcode>0</Opcode>
      <Keywords>0x8000000000000000</Keywords>
      <TimeCreated SystemTime="2012-06-05T03:59:09.630972500Z" />
      <EventRecordID>2967</EventRecordID>
      <Correlation />
      <Execution ProcessID="1680" ThreadID="860" /> a
      <Channel>Microsoft-HPC-Scheduler/Operational</Channel>
      <Computer>WindowsHPCHead.WindowsHPCDomain.com</Computer>
      <Security UserID="S-1-5-18" />
      </System>
    - <EventData>
      <Data Name="Message">Node WINDOWSHPCN1001.WindowsHPCDomain.com is unreachable because no IPV4 address could be found for it.</Data>
      <Data Name="ExceptionString">Exception detail: Microsoft.Hpc.Scheduler.Properties.SchedulerException: Node WINDOWSHPCN1001.WindowsHPCDomain.com is unreachable because no IPV4 address could be found for it. at Microsoft.Hpc.Scheduler.Communicator.Remoting.NodeController.Resolve(String name) at Microsoft.Hpc.Scheduler.Communicator.Remoting.NodeController.CreateNewNodeManager(String nodeName) at Microsoft.Hpc.Scheduler.Communicator.Remoting.NodeController.PingUnreachableNode(String nodeName, PingArg arg, NodeCommunicatorCallBack`1 callback) Current stack: at Microsoft.Hpc.Scheduler.SchedulerTracing.TraceException(String facility, Exception exception) at Microsoft.Hpc.Scheduler.Communicator.Remoting.NodeController.PingUnreachableNode(String nodeName, PingArg arg, NodeCommunicatorCallBack`1 callback) at Microsoft.Hpc.Scheduler.Communicator.Remoting.NodeListener.ReportNodeInformationEx(String nodeName, ComputeClusterNodeInformation nodeInfo, String& logicalName) at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Int32 methodPtr, Boolean fExecuteInContext, Object[]& outArgs) at System.Runtime.Remoting.Messaging.StackBuilderSink.SyncProcessMessage(IMessage msg, Int32 methodPtr, Boolean fExecuteInContext) at System.Runtime.Remoting.Messaging.ServerObjectTerminatorSink.SyncProcessMessage(IMessage reqMsg) at System.Runtime.Remoting.Messaging.ServerContextTerminatorSink.SyncProcessMessage(IMessage reqMsg) at System.Runtime.Remoting.Channels.CrossContextChannel.SyncProcessMessageCallback(Object[] args) at System.Runtime.Remoting.Channels.ChannelServices.DispatchMessage(IServerChannelSinkStack sinkStack, IMessage msg, IMessage& replyMsg) at System.Runtime.Remoting.Channels.BinaryServerFormatterSink.ProcessMessage(IServerChannelSinkStack sinkStack, IMessage requestMsg, ITransportHeaders requestHeaders, Stream requestStream, IMessage& responseMsg, ITransportHeaders& responseHeaders, Stream& responseStream) at System.Runtime.Remoting.Channels.Tcp.TcpServerTransportSink.ServiceRequest(Object state) at System.Runtime.Remoting.Channels.SocketHandler.ProcessRequestNow() at System.Runtime.Remoting.Channels.SocketHandler.BeginReadMessageCallback(IAsyncResult ar) at System.Net.LazyAsyncResult.Complete(IntPtr userToken) at System.Net.LazyAsyncResult.ProtectedInvokeCallback(Object result, IntPtr userToken) at System.Net.Security.NegotiateStream.ProcessFrameBody(Int32 readBytes, Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest) at System.Net.Security.NegotiateStream.ReadCallback(AsyncProtocolRequest asyncRequest) at System.Net.FixedSizeReader.CheckCompletionBeforeNextRead(Int32 bytes) at System.Net.FixedSizeReader.ReadCallback(IAsyncResult transportResult) at System.Net.LazyAsyncResult.Complete(IntPtr userToken) at System.Threading.ExecutionContext.runTryCode(Object userData) at System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(TryCode code, CleanupCode backoutCode, Object userData) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Net.ContextAwareResult.Complete(IntPtr userToken) at System.Net.LazyAsyncResult.ProtectedInvokeCallback(Object result, IntPtr userToken) at System.Net.Sockets.BaseOverlappedAsyncResult.CompletionPortCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* nativeOverlapped) at System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pOVERLAP)</Data>
      </EventData>
      </Event>

    Thanks & Regards,

    zhigao






    • Edited by zgcheng Tuesday, June 5, 2012 4:05 AM
    Tuesday, June 5, 2012 2:38 AM
  • Hi,

    Thanks to Michael_Man:

    SP3 for HPC Pack 2008 R2 has solved my problem:

    http://www.microsoft.com/en-us/download/details.aspx?id=28017

    Now everything works fine.

    BR

    Peter

    Tuesday, June 5, 2012 6:23 AM
  • Hi,

    You are right. Update to SP3 solve the problem. Thanks so much.

    Thanks & Regards,

    zhiga

    Tuesday, June 5, 2012 8:21 AM