none
Disaster recovery options with HPC Pack 2012 R2? RRS feed

  • Question

  • Hello,

    I was wondering what our options were in terms of disaster recovery around the head node.  I saw that there was a high availability setup possible (https://technet.microsoft.com/en-us/library/hpc_cluster_failover_cluster_for_soa.aspx?f=255&MSPPError=-2147217396) but not sure if our IT department can support the setup required for that to happen/we don't actually need HA for this application.  Is there any options for disaster recover that is a step below the HA option that would require less initial setup?  I would like to just setup 2 head nodes, 1 hot 1 warm, and if the 1st head node starts having issues, be able to just issue a command which will then switch the broker nodes to start connecting & accepting commands from the backup head/broker node.  We would be OK with either each head node to have its own local database instance, or to have an external database elsewhere that they could share if needed.  We attempted this by setting up 2 independent head nodes, and then tried switching the compute nodes from 1 head node to the other (via the HPC command line), but that didn't seem to work at all.  

    Is this setup possible? Or is the HA option the only option that is supported?  We are OK with some downtime in this application, so wanted to explore what the options were before we went down the path of trying to setup HA.

    Thanks!


    Jason

    Monday, July 27, 2015 2:01 PM

Answers

  • If you don't have real time failover when bad things happened on the headnode, there are couple of options you can try, but most importantly, you need plan for the SQL database for disaster recovery -- which has plenty of options as well.

    1. Backup and Restore the HPC dababase, you may lose your running job work. Details here: https://technet.microsoft.com/en-us/library/hh332930(v=ws.10).aspx

    2. Standby headnode, this hasn't been documented publicly, but it works. Set up two headnode with its own job database, create CName and point to the active headnode, have the client always use the CName instead of the real headnode name. When switching the CName to another headnode, use https://technet.microsoft.com/en-us/library/dn606163.aspx move the nodes to the new headnode as well.


    Qiufang Shi

    Tuesday, July 28, 2015 3:49 AM

All replies

  • If you don't have real time failover when bad things happened on the headnode, there are couple of options you can try, but most importantly, you need plan for the SQL database for disaster recovery -- which has plenty of options as well.

    1. Backup and Restore the HPC dababase, you may lose your running job work. Details here: https://technet.microsoft.com/en-us/library/hh332930(v=ws.10).aspx

    2. Standby headnode, this hasn't been documented publicly, but it works. Set up two headnode with its own job database, create CName and point to the active headnode, have the client always use the CName instead of the real headnode name. When switching the CName to another headnode, use https://technet.microsoft.com/en-us/library/dn606163.aspx move the nodes to the new headnode as well.


    Qiufang Shi

    Tuesday, July 28, 2015 3:49 AM
  • You setup looks like possible, can you share more what won't work for you?

    Qiufang Shi

    Tuesday, July 28, 2015 3:51 AM
  • Thanks I'll give it another shot later today but we ran into a couple issues if I recall: 1) the clients didn't register properly when using the vdns on the client. I will retry and get the error but when we switched it back to the physical name it registered properly. Submitting the jobs themselves we could reference the vdns but client registration required the physical name. 2) I believe we used a different script then the one you linked. My coworker did this testing and unfortunately he is out this week. I will try that out again in our environment and see if that works. I'm thinking he found some command maybe meant for a different task. Thanks!

    EDIT: i just realized you meant just to use the CNAME on the client side vs. actually configuring the compute nodes to use the CNAME instead of the physical head node name when setting them up.  So it looks like we were on the right track, will try the powershell script you linked to to flip over to the standby node.  


    Thanks!

    Tuesday, July 28, 2015 9:35 AM
  • Wanted to let you know I retried and once I switched to that powershell script, I was able to flip between the head nodes without issue.

    Thanks again!

    Tuesday, July 28, 2015 1:13 PM