Compute Node unreachable
-
19 marca 2012 15:57
My Compute Node keeps unreachable even after checking all points in this Technet Post:
http://technet.microsoft.com/en-us/library/ee523229%28WS.10%29.aspx
Everything works: ping from both sides, Fileshare-connections, ping to domain conroller, RDP directly from Cluster Manager.....
But the Compute Nodes Keeps unreachable.Where is the Problem in my Case?
FYI: my Configuration:
Topology: Compute Nodes isolated on a private Network
The Infrastructure is hosted by an external Provider.BR
Wszystkie odpowiedzi
-
20 marca 2012 03:29
Hello.
Did you apply node template to these nodes?
-
20 marca 2012 06:32
Hi,
Sure I have assigned a Node Template (a very simple one),.. even when I take the node Offline and reassign it, the node stille keeps unreachable....
-
27 marca 2012 23:47
Could you go to one fo the CNs and make sure "HPCNodeManagerService" is running? It would also be helpful to check out the event logs on the CNs.
Michael
-
29 marca 2012 09:40
HI,
HPCNodeManagerService is running, and there are no error or warnings in the Event Log of the CN.
BR
Peter
-
29 marca 2012 11:44
I found something:
the Host name of the CN ist written in the hosts-file, and not listed in the DNS.
So ping to the Server is working, but logically nslookup fail.Can that be a Problem for HPC?
Peter
-
3 kwietnia 2012 00:13
Hi Peter,
It doesn't look like the problem with HPC, you should work towards getting your DNS server fixed. If DNS is not working, CNs will go unreachable.
Michael
- Zaproponowany jako odpowiedź przez Michael_Man 3 kwietnia 2012 00:13
- Cofnięcie jako propozycji odpowiedzi przez Peter Schaunitzer 30 maja 2012 13:35
-
29 maja 2012 10:51
Hi Michael,
Now i have this Problem with all of my Compute Nodes. Do they all have to be an entry in the DNS?
The configuration in ../etc/hosts is not enough?
Because I can Ping the node and also Telnet from the Head Node to the Compute Node for the Node Managment Service is successfully (Port 1856)..
BR
Peter
-
29 maja 2012 20:22
Hi Peter,
I'd say if you have the same entry in ../etc/hosts on the HN, you hsould be good to go.
Michael
-
30 maja 2012 05:28
Hi,
Yes, the Entry is also defined in the hosts-file of the Head node.
But still: "Compute Node unreachable"....
Everything is working fine:
- Telnet from HN to CN on Port 1856 ==> OK
- RDP from HN to CN ==> OK
- Ping from HN to CN ==> OK
- Ping from CN to HN ==> OK
- Assign-Node Template to CN ==> OK
So I don't understand why there is the Error-Message "HPC Node Manager Service unreachable".
And thats the Problem on all CN:
-
31 maja 2012 16:24
Hi Peter,
Did you check your firewall? Could you try disabling firewalls on both sides?
Michael
-
1 czerwca 2012 05:15
Hi Michael,
The firewall is disabled on both sides. And when i take a look at Tcpview or netstat, there are many open
connections between cn and hn...
BT
-
1 czerwca 2012 17:11
And you have the latest HPC pack installed right?
Please email me at micman@microsoft.com so we can further discuss your problem, it's definitely very interesting to look at...
Michael
-
4 czerwca 2012 08:46
Hi,
Have you guys set out the problem? I have exactly the same problem, and do not know how to solve it.
Any help is appreciated.
Thanks,
zhigao
-
4 czerwca 2012 17:00
hi Zhigao,
Not yet, did you try the same steps Peter did? Which version of HPC Pack are you running?
Michael
-
5 czerwca 2012 02:38
Hi Michael,
I have checked everything listed above. My versioin is 2008 R2. I tried disable firewall on both sides and not working also. I am using topology 1, so now the HN firewall on and the CN firewall off. And i did not activate the license for both HN and CN. Does this related to some group policy problem or firewall rules? I am not quite sure about this part.
I found one error from the Event Viewer-> Applications and services logs->Microsoft->HPC->Scheduler->operational->Node *** is unreachable because no IPV4 address could be found for it. But it has a internal IPV4 address 192.168.1.3 which is shown on the Node->Network tab. The details are below:
//===========================================================================
+ <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
- <System>
<Provider Name="Microsoft-HPC-Scheduler" Guid="{5B169E40-A3C7-4419-A919-87CD93F2964D}" />
<EventID>8</EventID>
<Version>0</Version>
<Level>2</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime="2012-06-05T03:59:09.630972500Z" />
<EventRecordID>2967</EventRecordID>
<Correlation />
<Execution ProcessID="1680" ThreadID="860" /> a
<Channel>Microsoft-HPC-Scheduler/Operational</Channel>
<Computer>WindowsHPCHead.WindowsHPCDomain.com</Computer>
<Security UserID="S-1-5-18" />
</System>
- <EventData>
<Data Name="Message">Node WINDOWSHPCN1001.WindowsHPCDomain.com is unreachable because no IPV4 address could be found for it.</Data>
<Data Name="ExceptionString">Exception detail: Microsoft.Hpc.Scheduler.Properties.SchedulerException: Node WINDOWSHPCN1001.WindowsHPCDomain.com is unreachable because no IPV4 address could be found for it. at Microsoft.Hpc.Scheduler.Communicator.Remoting.NodeController.Resolve(String name) at Microsoft.Hpc.Scheduler.Communicator.Remoting.NodeController.CreateNewNodeManager(String nodeName) at Microsoft.Hpc.Scheduler.Communicator.Remoting.NodeController.PingUnreachableNode(String nodeName, PingArg arg, NodeCommunicatorCallBack`1 callback) Current stack: at Microsoft.Hpc.Scheduler.SchedulerTracing.TraceException(String facility, Exception exception) at Microsoft.Hpc.Scheduler.Communicator.Remoting.NodeController.PingUnreachableNode(String nodeName, PingArg arg, NodeCommunicatorCallBack`1 callback) at Microsoft.Hpc.Scheduler.Communicator.Remoting.NodeListener.ReportNodeInformationEx(String nodeName, ComputeClusterNodeInformation nodeInfo, String& logicalName) at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Int32 methodPtr, Boolean fExecuteInContext, Object[]& outArgs) at System.Runtime.Remoting.Messaging.StackBuilderSink.SyncProcessMessage(IMessage msg, Int32 methodPtr, Boolean fExecuteInContext) at System.Runtime.Remoting.Messaging.ServerObjectTerminatorSink.SyncProcessMessage(IMessage reqMsg) at System.Runtime.Remoting.Messaging.ServerContextTerminatorSink.SyncProcessMessage(IMessage reqMsg) at System.Runtime.Remoting.Channels.CrossContextChannel.SyncProcessMessageCallback(Object[] args) at System.Runtime.Remoting.Channels.ChannelServices.DispatchMessage(IServerChannelSinkStack sinkStack, IMessage msg, IMessage& replyMsg) at System.Runtime.Remoting.Channels.BinaryServerFormatterSink.ProcessMessage(IServerChannelSinkStack sinkStack, IMessage requestMsg, ITransportHeaders requestHeaders, Stream requestStream, IMessage& responseMsg, ITransportHeaders& responseHeaders, Stream& responseStream) at System.Runtime.Remoting.Channels.Tcp.TcpServerTransportSink.ServiceRequest(Object state) at System.Runtime.Remoting.Channels.SocketHandler.ProcessRequestNow() at System.Runtime.Remoting.Channels.SocketHandler.BeginReadMessageCallback(IAsyncResult ar) at System.Net.LazyAsyncResult.Complete(IntPtr userToken) at System.Net.LazyAsyncResult.ProtectedInvokeCallback(Object result, IntPtr userToken) at System.Net.Security.NegotiateStream.ProcessFrameBody(Int32 readBytes, Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest) at System.Net.Security.NegotiateStream.ReadCallback(AsyncProtocolRequest asyncRequest) at System.Net.FixedSizeReader.CheckCompletionBeforeNextRead(Int32 bytes) at System.Net.FixedSizeReader.ReadCallback(IAsyncResult transportResult) at System.Net.LazyAsyncResult.Complete(IntPtr userToken) at System.Threading.ExecutionContext.runTryCode(Object userData) at System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(TryCode code, CleanupCode backoutCode, Object userData) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Net.ContextAwareResult.Complete(IntPtr userToken) at System.Net.LazyAsyncResult.ProtectedInvokeCallback(Object result, IntPtr userToken) at System.Net.Sockets.BaseOverlappedAsyncResult.CompletionPortCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* nativeOverlapped) at System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pOVERLAP)</Data>
</EventData>
</Event>Thanks & Regards,
zhigao
- Zmodyfikowany przez zgcheng 5 czerwca 2012 04:05
-
5 czerwca 2012 06:23
Hi,
Thanks to Michael_Man:
SP3 for HPC Pack 2008 R2 has solved my problem:
http://www.microsoft.com/en-us/download/details.aspx?id=28017
Now everything works fine.
BR
Peter
- Oznaczony jako odpowiedź przez Peter Schaunitzer 5 czerwca 2012 06:23
-
5 czerwca 2012 08:21
Hi,
You are right. Update to SP3 solve the problem. Thanks so much.
Thanks & Regards,
zhiga