Compute Node unreachable RRS feed

  • Question

  • When the machine boots, everything is OK.My Compute Nodes become unreachable after running for a while(one day or half day, it happens randomly) and the state of the nodes is online.

    Originally, there are 6 nodes, one  head node, 5 computer nodes two years ago. The OS is  hpc server 2008 sp2(not R2). Then I add 6 new computer nodes with hpc server 2008 sp1(not R2). The problem only happed to new 6 nodes.

    I have checked points  like these:

    Everything works: ping from both sides, ping to domain conroller, ping the hostname of headnode from compute node, ping the compute RDP directly from Cluster Manager.....
    But the Compute Nodes Keeps unreachable.

    The problem is really similar to this one, but my OS system is hpc server 2008 sp1, and I can't find where to download the sp2. In addition, I don't  know whether sp2 can solve the problem. I only find sp2 for server 2008 and when I double click the file it tips invalid data.


     Where is the Problem in my Case? I can't change the OS of head node and all nodes can't access to the Internet for  security.
    Monday, July 8, 2013 4:54 AM

All replies

  • Similiar problem here.

    Cluster build on Hyper-V (Domain Controller,Head Node,Compute Nodes)

    Os: Wwindow Server 2008  R2 SP1,HPC Pack Sp1

    On current machine I set up virtual cluster twice and it worked fine, however the third time (all VMs are new) I have connectivity problem.All compute nodes don't connect to head node.

    * ping HN <=> CN is okay

    * CCP_SCHDULER variable is set

    * HPC Services on Compute Nodes and on Head Node are in runnig state

    * netstat -anb -p tcp shows HPC Management Service socket in CLOSE_WAIT state

    * firewall is OFF

    Daniel Drypczewski

    Thursday, July 11, 2013 9:19 AM
  • In my case some machines in the network had the same SID and that was causing connectivity problems.

    Can you check  logs ?

    Head Node

    Event Viewer\Applications and Services Logs\Microsoft\HPC\Scheduler\Operational

    Event Viewer\Applications and Services Logs\Microsoft\HPC\Scheduler\Admin

    Compute Node

    Event Viewer\Applications and Services Logs\Microsoft\HPC\Management\Operational

    Event Viewer\Applications and Services Logs\Microsoft\HPC\Management\Admin

    Daniel Drypczewski

    Tuesday, July 16, 2013 5:36 AM
  • Thank you for your reply!I will check it tomorrow because I am out now. If the same situation happens as you said, how can I change the SID?
    Tuesday, July 16, 2013 5:43 AM
  • Wednesday, July 17, 2013 4:46 AM
  • Thank you very much! I have tried type  "whoami /user" in cmd. Then I find all computer nodes have the same SID. Does it depend on the user name you log in?

    I test it by RDP through head node.

    Wednesday, July 17, 2013 6:08 AM
  • There is machine SID and account SID.

    According to this info


    account SID = machine SID + RID (Relative ID)

    If machine SID is the same on each node you may encounter security problem. 

    Daniel Drypczewski

    Wednesday, July 17, 2013 9:00 AM
  • Thank you for your reply. I will try it again.
    Friday, July 19, 2013 3:30 PM