locked
HPC2008 change from HPC2003 on headnode behavior RRS feed

  • Question

  • I'm sure that someone is developing HPC2008 out there.

    Our problem is caused by the drastic change done in HPC2008 from HPC2003 that the head node ID is no longer zero
    and furthermore if you use the failover headnodes, then the (virtual) headnode no longer participates in MPI (no rank
    is given to MPI).  Essentially the (virtual) headnode job is only the job manager.

    I can understand the reason for the change from the failover, but our code used to use the headnode is the gather destination
    for the results and the broadcast source for the data delivery for nodes under MPI under HPC2003.    

    Are there anyone out there explain how you are coping with this change?   The only way we can think of is to
    write another layer of communication (Remoting or WCF) between the ID=0 and the (virtual) headnode.

    We like to hear your approach.

    Thank you.
    Monday, June 29, 2009 9:31 PM

Answers

  • you are correct, when using HPC2008 Hight Availablity none of the head nodes nor the virtual cluster name can be used as compute nodes. However without HA the head node can still be used as a compute node. To enable that, change the head node role to include compute node (change the head node status to offline; change the role; and bring the headnode back up).

    However, the way that MPI rank 0 is allocated is unrelated. Rank 0 always runs on the first node allocated to the job. That is, the HPC job scheduler picks a set of nodes for the job/task where mpiexec starts rank 0 on that node. To see the list of nodes use 'task view <id>' or echo %CCP_NODES% when the task executes.

    How did your MPI application returned results to the user without writing files? did it connect back to the user application using sockets/wcf? if it did, you can still connect from any compute node running rank 0. Connect back to the user application outside the cluster, the connection will go through NAT on the head node (assuming your using network topology #1 or #3) when your compute nodes are isolated on a private network.

    thanks,
    .Erez
    • Marked as answer by Don Pattee Friday, February 4, 2011 10:19 PM
    Monday, August 10, 2009 6:53 PM

All replies

  • Hi,

    Can you give more details? (what do you mean by: "the head node ID is no longer zero")

    You can change the role of the headnode to be a headnode only or a headnode and a compute node. Jobs are submitted only the compute nodes (that are online).

    To change the role of the headnode, take it offline first, change the role and bring it back online.

    thanks,
    .Erez
    • Proposed as answer by Don Pattee Saturday, July 25, 2009 6:27 AM
    • Unproposed as answer by Don Pattee Tuesday, August 11, 2009 1:21 AM
    Monday, July 6, 2009 5:05 PM
  • Under the failover clustering HPC2008, the headnode no longer does the computation (according to the documentation).   I just tested this by printing the ID and the hostname from a MPI program.   Our program was talking to the headnode which turned out to be a compute node ID=0 under HPC2003 and this ID=0 node was used to MPI gather tasks to get the computation results back to the user without writing files.   We have to rewrite our application to get MPI results from the compute node.  

    Is this enough ?  
    Thursday, August 6, 2009 2:40 PM
  • you are correct, when using HPC2008 Hight Availablity none of the head nodes nor the virtual cluster name can be used as compute nodes. However without HA the head node can still be used as a compute node. To enable that, change the head node role to include compute node (change the head node status to offline; change the role; and bring the headnode back up).

    However, the way that MPI rank 0 is allocated is unrelated. Rank 0 always runs on the first node allocated to the job. That is, the HPC job scheduler picks a set of nodes for the job/task where mpiexec starts rank 0 on that node. To see the list of nodes use 'task view <id>' or echo %CCP_NODES% when the task executes.

    How did your MPI application returned results to the user without writing files? did it connect back to the user application using sockets/wcf? if it did, you can still connect from any compute node running rank 0. Connect back to the user application outside the cluster, the connection will go through NAT on the head node (assuming your using network topology #1 or #3) when your compute nodes are isolated on a private network.

    thanks,
    .Erez
    • Marked as answer by Don Pattee Friday, February 4, 2011 10:19 PM
    Monday, August 10, 2009 6:53 PM