none
Error Installing HPC 2008 SP1 on Head Node Fail Over Cluster RRS feed

  • Question

  • Hello,
    I am experiencing problems with installing HPC Pack 2008 Service Pack 1 on to a head node fail over cluster.  I aleady have the cluster up and running without the service pack.

    The release note for the service pack (http://technet.microsoft.com/en-us/library/ee221139(WS.10).aspx) clearly states that it should first be installed on the passive node, then on the active node. However when I attempt to run the service pack installer on the passive node the following error is given:

    'The Failover Clustering resource for the Microsoft HPC SDM Store service must be online and hosted on this node in order to install the patch/upgrade. (error code: 5007)'

    I have tried installing on the active node first but was then unable to install on the passive node, and hence was unable to fail over so ended up breaking the configuration.

    Has anyone else experienced this problem or know of a workaround/fix?

    Many Thanks
    Phil
    Tuesday, February 2, 2010 3:05 PM

Answers

  • Hi Phil
    I guess you are running the SQL instance under credentials of a domain user? If so I think you may find that the SP1 install has reset permissions on the SQL data files / containing folder so that the service account does not have access to it's data/log files. If you re-add permissions to the database files for your SQL service account you should be able to restart the DB instance, and from there the HPC services on the patched node.
    Let me know how you get on with this.
    Regards
    Dan
    • Marked as answer by phil_e Wednesday, February 3, 2010 1:43 PM
    Wednesday, February 3, 2010 12:07 PM
  • Hi Dan,

    Your were right!

    For whatever reason installation of sp1 had removed the NTFS permissions for the SQL Server group from the MSSQL folders which house the HPC databases.

    Having reapplied the permissions I have now successfully started all resources and installed the sp on the remaining node.
    Everything now seems to be in working order.

    So to recap, the problem was down to
    1, The Microsoft release note for the service pack giving the incorrect install procedure
    2, The service pack installer removing the NTFS permissios for the SQL Server group
    Thanks for that!

    For anyone else who experiences problems installing HPC 2008 Service Pack 1 on a head node fail over cluster, here is a summary of the procedure that worked for me:

    1. Install service pack on active node
    2. The node will restart following the install and the cluster resources will fail over to the alternate node, at this stage the SQL Server and therefore the HPC cluster resources will fail to start
    3. Reapply the NTFS permissions for the SQL Server group to the MSSQL folder which holds the HPC instance data
    4. Start the HPC cluster resources on the un-service packed node
    5. Install the service pack on this node
    6. Once the install has completed and the node restarted the cluster group for HPC should start on either node.

    Thanks to Dan for his help with this.

    Phil
    • Marked as answer by phil_e Wednesday, February 3, 2010 1:43 PM
    Wednesday, February 3, 2010 1:42 PM

All replies

  • Hi Phil
    Are you able to give some more information on your configuration please. Once you have installed SP1 on the active node which clustered services are unable to start? Is the clustered SQL instance running OK?
    Any useful information in event logs?
    Cheers
    Dan

    Tuesday, February 2, 2010 3:55 PM
  • Hi, thanks for your reply,

    All of the clustered services are running ok without SP1 installed.
    I can install SP1 on the active node, however due to the error detailed above I am unable to install it on the passive node.  If I attempt to move the cluster resources to the passive node after installing sp1 then they fail to start, assumingly due to the fact that sp1 is not on that node.  (I am assuming that SP1 includes updates to the database).

    I do not have the exact errors that occur when I attempt to move the cluster resources from the active sp1 node to the passive non-sp1 node as I have since rebuilt the cluster.

    It seems that the sp1 installer is not behaving as detailed in the release notes, I was hoping someone may have a suggested work around/fix for this so I could install the sp as documented, rather than try to work around the issue by installing on the active node first.  However if this is not forthcoming I will retry installing on the active node and post the exact errors received when I attempt the failover.

    Thanks,
    Phil

    Tuesday, February 2, 2010 4:49 PM
  • Hi Phil
    I went through the install on active node first workaround when installing SP1. I'm not sure but I think the documentation is incorrect in this instance.
    When you attempt the failover are you then able to bring resources back up on the already service packed node?
    Dan
     

    Tuesday, February 2, 2010 5:04 PM
  • Hi,

    Okay, I have installed the sp on the active node - the installer ran without error then rebooted the node.
    As a result of the node rebooting the cluster reources were failed over to the passive (un service packed) node.
    When this happened the SQL Server (ComputeCluster) resource failed to come online, as such the HPC resources remained offline.  Further manual attempts to bring the SQL Server service online result in failure, the errors of note returned in the application log are:

    initerrlog: Could not open error log file 'S:\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\ERRORLOG'. Operating system error = 5(error not found). - A bit odd as the above location is online and accessible from the active node

    The configuration of the AdminConnection\TCP protocol in the SQL instance COMPUTECLUSTER is not valid. - This error is returned by SQLBrowser

    [sqsrvres] StartResourceService: Failed to start MSSQL$ComputeCluster service. CurrentState: 1

    [sqsrvres] OnlineThread: ResUtilsStartResourceService failed (status 435)
    [sqsrvres] OnlineThread: Error 435 bringing resource online.

    If I attempt to move the cluster group back to the service packed node the same errors are received. (the SQL Server resource failed to start)
    Furthermore I am still unable to install the sp on the other node due to the error stated in the original post (above).

    So it would appear that installing the service pack on the active node has 'broken' the sql configuration in some way.
    It's probably also worth noting that I have also tried the above but with the passive node offline, so as to stop the cluster attempting to move the HPC resources following the sp1 reboot - this has also failed as above.

    I will spend some time looking into the above errors, meanwhile any further assistance would be greatly appreciated.

    I am assuming this should just work !

    Thanks,
    Phil


    Wednesday, February 3, 2010 11:23 AM
  • Hi Phil
    I guess you are running the SQL instance under credentials of a domain user? If so I think you may find that the SP1 install has reset permissions on the SQL data files / containing folder so that the service account does not have access to it's data/log files. If you re-add permissions to the database files for your SQL service account you should be able to restart the DB instance, and from there the HPC services on the patched node.
    Let me know how you get on with this.
    Regards
    Dan
    • Marked as answer by phil_e Wednesday, February 3, 2010 1:43 PM
    Wednesday, February 3, 2010 12:07 PM
  • Hi Dan,

    Your were right!

    For whatever reason installation of sp1 had removed the NTFS permissions for the SQL Server group from the MSSQL folders which house the HPC databases.

    Having reapplied the permissions I have now successfully started all resources and installed the sp on the remaining node.
    Everything now seems to be in working order.

    So to recap, the problem was down to
    1, The Microsoft release note for the service pack giving the incorrect install procedure
    2, The service pack installer removing the NTFS permissios for the SQL Server group
    Thanks for that!

    For anyone else who experiences problems installing HPC 2008 Service Pack 1 on a head node fail over cluster, here is a summary of the procedure that worked for me:

    1. Install service pack on active node
    2. The node will restart following the install and the cluster resources will fail over to the alternate node, at this stage the SQL Server and therefore the HPC cluster resources will fail to start
    3. Reapply the NTFS permissions for the SQL Server group to the MSSQL folder which holds the HPC instance data
    4. Start the HPC cluster resources on the un-service packed node
    5. Install the service pack on this node
    6. Once the install has completed and the node restarted the cluster group for HPC should start on either node.

    Thanks to Dan for his help with this.

    Phil
    • Marked as answer by phil_e Wednesday, February 3, 2010 1:43 PM
    Wednesday, February 3, 2010 1:42 PM
  • Hi Phil
    Good to hear that you've got SP1 installed now.
    I raised this issue in this forum a while ago, & it is something they Microsoft are aware of. I think the response was that the developers will consider this for the HPC Server 2008 R2 install process.
    Regards
    Dan

    Wednesday, February 3, 2010 1:56 PM