locked
HPC installation on Failover Cluster fails when starting services RRS feed

  • Question

  • Hi,

    we have an issue with deploying a HPC Server 2008 R2 in a Failover Cluster. The cluster itself has been created and we also have the file service installed. Failover for the file service is also working. However, the installation fails when starting the HPC services. None of them comes up and the following errors are logged:

    CAQuietExec:  16:07:47.143: F3- 369: Bringing group 'vesta' online
    CAQuietExec:  16:07:47.159: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:07:48.173: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:07:49.187: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:07:50.201: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:07:51.215: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:07:52.229: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:07:53.243: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:07:54.257: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:07:55.271: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:07:56.285: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:07:57.299: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:07:58.313: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:07:59.327: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:00.341: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:01.355: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:02.369: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:03.383: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:04.397: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:05.411: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:06.425: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:07.439: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:08.453: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:09.467: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:10.481: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:11.495: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:12.509: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:13.523: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:14.537: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:15.551: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:16.565: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:17.579: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:18.593: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:19.607: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:20.621: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:21.635: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:22.649: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:23.663: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:24.677: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:25.691: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:26.705: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:27.719: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:28.733: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:29.747: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:30.761: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:31.775: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:32.789: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:33.803: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:34.817: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:35.831: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:36.845: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:37.859: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:38.873: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:39.887: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:40.901: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:41.915: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:42.929: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:43.943: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:44.957: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:45.971: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:46.985: F3- 402: Group 'vesta' state is Pending; sleeping
    CAQuietExec:  16:08:48.591: F2- 502: Setting install state to 128 for this node
    CAQuietExec:  16:08:48.607: F1- 271: Installed Failed: Exception @ line 410 in file 3 - error 997: File Server Group failed to go online
    CAQuietExec:  
    CAQuietExec:  Error 0x800703e5: Command line returned an error.
    CAQuietExec:  Error 0x800703e5: CAQuietExec Failed
    CustomAction ConfigureHA returned actual error code 1603 (note this may not be 100% accurate if translation happened inside sandbox)
    Action ended 16:08:48: InstallFinalize. Return value 3.
    

    The databases are located on a remote server and the installer is able to create the ccpheadnode use. This use is also assigned the db_owner role on all four databases. Nevertheless, the SQL Server Log contains errors:

    01/18/2012 16:07:48,Logon,Unknown,Login failed for user 'VISUS\VESTA$'. Reason: Token-based server access validation failed with an infrastructure error. Check for previous errors. [CLIENT: 129.69.205.7]
    01/18/2012 16:07:48,Logon,Unknown,Error: 18456<c/> Severity: 14<c/> State: 11.

    Any suggestions?

    Best regards,
    Christoph


    Wednesday, January 18, 2012 3:49 PM

Answers

  • Got it: There is a problem with Windows 2008 R2 and UAC and an Administrators group. I took it easy used SQL-Authentication instead of integrated authentication during the setup and everything worked fine so far

    Frank

    • Proposed as answer by Frank Friebe Wednesday, April 25, 2012 1:52 PM
    • Marked as answer by Don Pattee Sunday, May 13, 2012 2:54 AM
    Wednesday, April 25, 2012 1:52 PM

All replies

  • I'm currently running into a simular issue. Did you find out a solution?

    Frank

    Wednesday, April 25, 2012 1:11 PM
  • Got it: There is a problem with Windows 2008 R2 and UAC and an Administrators group. I took it easy used SQL-Authentication instead of integrated authentication during the setup and everything worked fine so far

    Frank

    • Proposed as answer by Frank Friebe Wednesday, April 25, 2012 1:52 PM
    • Marked as answer by Don Pattee Sunday, May 13, 2012 2:54 AM
    Wednesday, April 25, 2012 1:52 PM
  • You need to figure out why the Failover Cluster resource group failed to go online. There are various ways of doing that: I'd start with the System log in the Event Log viewer where Failover Cluster service logs its events. Filter on Failover Cluster (probably warnings and errors too) find the failures and see if anything meaningful is described. Failover clustering is a bit removed from how HPC uses it so the event log entries may not contain actionable data (as in, all it tells you is something failed but you know that already).

    If that is the case, you have to reinstall but there is a bit of a chicken/egg problem in that if any of the HPC services fail to start then the installation is considered failed and HPC setup (via Windows Installer) rolls back and uninstalls everything including the event logs that could provide insight into why the installation failed.

    The trick is to wait until HPC setup displays the dialog box indicating that installation failed but do not dismiss it. Also have the Failover Cluster manager snapin up and running so you can observe the installation from that perspective. When the failure dialog appears, bring the Event Log viewer up again and look for the various operation/administrative logs under Applications and Services Logs/Microsoft/HPC. If you were watching the resource group while it is brought online, then you can probably figure out which service is failing. Using that inspect the log and see if you can figure out why it is failing.

    I would guess a permissions issue of some sort but since there are other resources in the group (disks, IP addresses, etc) they could be at fault as well. Prior to install HPC, you should double check your HA configuration by failing the resource group over and back again to make sure that there are no issues there.

    If you report back and the need for more help, it will be good to have the particulars of your configuration, namely whether SQL is part of the resource group, etc.

    c

    Tuesday, May 1, 2012 9:13 PM