none
HPC PACK 2016 Update 2 will not start after server reboot

    Question

  • I have installed Server 2016 on a VM Workstation 15.0

    MS Patched

    Installed HPC Pack 2016 Update2

    All looks good. Can connect tothe HPC server using the HPC Cluster Manager.

    All HPC services are running.

    All installations completed as Administrator. Followed steps on Microsoft installing HPC site

    Reboot the Server and the HPC services do not start. Cannot start them manually.

    When try to connect using HPC Cluster Manager get the following alert:

    The connection to the scheduler service failed. detail error: System.AggregateException: One or more errors occurred. ---> Microsoft.Hpc.RetryCountExhaustException: Retry Count of RetryManager is exhausted. ---> System.Net.Http.HttpRequestException: Response status code does not indicate success: 404 (Not Found).

    Have googled this error, but not found. Can anybody advise on this please?

    Have tried recreating the VM many times with slightly different configuration. All results are the same. After a reboot, the HPC services do not start.

    Tuesday, 6 November 2018 10:31 AM

All replies

  • Can you check the service log to see why the service do not start? 

    the logs located under %CCP_DATA%LogFiles\ServiceName\*.bin

    You can use %CCP_HOME%BIN\LogParser.exe to parse the bin log to plain text


    Qiufang Shi

    Tuesday, 6 November 2018 8:44 PM
  • Hi Qiufang Shi,

    Thank you for this suggestion. I have parsed the contents of all LogFiles, but there is nothing there to indicate why they have not started. the last entries are from just after installing the HPC 2016 Pack2 just before the server was restarted. Its almost as if the services were not even tried to be started.

    I have verified all HPC services are set to start Automatically. When restarted, the Eventviewer has errors for each HPC service similar to:

    "A timeout was reached (30000 milliseconds) while waiting for the HpcSdm service to connect."

    Followed shortly after with:

    "The HpcSdm service failed to start due to the following error: 
    The service did not respond to the start or control request in a timely fashion."

    There is nothing in the Security log to indicate a Service does not have permission to run.

    Has anybody tried installing Server 2016 on a VM Workstation 15.0, then adding the HPC Pack 2016 update2?

    Wednesday, 7 November 2018 8:23 AM
  • could you also check the hpcsdm service log? 

    Qiufang Shi

    Wednesday, 7 November 2018 4:12 PM
  • After system restart, the following line is in the HPCSDM log file:

    SrcFile="HpcSdm.exe" SrcFunc="" SrcLine="0" Pid="5072" Tid="2244" TS="0x01d478209ff63f36" String1="Unable to load file SqlConnectionStringProvider.dll. Exception System.IO.FileNotFoundException: Could not load file or assembly 'file:///C:\Program Files\Microsoft HPC Pack 2016\Bin\SqlConnectionStringProvider.dll' or one of its dependencies. The system cannot find the file specified...File name: 'file:///C:\Program Files\Microsoft HPC Pack 2016\Bin\SqlConnectionStringProvider.dll'..   at System.Reflection

    Followed by multiple lines of :

    SrcFile="HpcSdm" SrcFunc="" SrcLine="0" Pid="5072" Tid="2244" TS="0x01d47820c3e228fd" String1="[Store   ] Exception:.System.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections.

    As this service cannot start, the other HPC services do not start.

    Verified the SqlConnectionStringProvider.dll file is not in the location specified

    How is this created?

    Friday, 9 November 2018 12:10 PM
  • Thanks Andy,

      the first warning you can ignored. but the second is the real error you're facing. The service can't establish connection to the management database. Could you check why?

    the database connection string should be loaded from HKLM\Software\Microsoft\HPC\Security\ManagementDbConnectionString

    And we will try to use the headnode's machine account to connect to the SQL database


    Qiufang Shi

    Friday, 9 November 2018 6:49 PM
  • Hi Andy,

    Could you try to check if the SQL Server is running? ("Sql Server Configuration Manager" -> "SQL Server Services" -> "State" of "SQL Server(COMPUTECLUSTER)")

    If SQL Server is stopped, message like below would appear:

    System.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections.

    Chenling

    • Proposed as answer by infySEO Saturday, 10 November 2018 7:39 AM
    Saturday, 10 November 2018 7:10 AM
  • <g class="gr_ gr_20 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del multiReplace" data-gr-id="20" id="20">InfySEO</g>-A Complete guide of SEO Submission List | SEO Solution


    Saturday, 10 November 2018 7:43 AM
  • <a href="http://www.infyseo.com/
    ">Infyseo</a>- Arrange High-quality Backlinks website at one place like Social Bookmarking site list, Directory submission, business listing sites, press release, Search Engine Submissions, blog submission, web2.0 sites, article submission sites etc.
    
    <a href=http://www.infyseo.com/top-social-bookmarking-sites/"
    ">Top Social Bookmarking Sites</a>
    <a href=http://www.infyseo.com/social-bookmarking-sites-list/"
    ">Social Bookmarking Sites List</a>
    <a href=http://www.infyseo.com/free-article-submission-sites-list/"
    ">Free Article Submission Sites List</a>
    <a href=http://www.infyseo.com/indian-classified-sites-list/"
    ">Indian Classified Sites List</a>
    <a href=http://www.infyseo.com/usa-classified-sites-list/"
    ">USA Classified Sites List</a>
    <a href=http://www.infyseo.com/business-listing-sites-list/"
    ">BUSINESS LISTING SITES LIST</a> 
    <a href=http://www.infyseo.com/web-2-0-sites-list/"
    ">Web 2.0 Sites List</a>
    <a href=http://www.infyseo.com/free-press-release-sites-list/"
    ">Free Press Release Sites List</a>
    <a href=http://www.infyseo.com/free-pdf-submission-sites/"
    ">Free PDF Submission Sites</a>
    <a href=http://www.infyseo.com/ppt-submission-sites-list/"
    ">PPT Submission Sites List</a>
    <a href=http://www.infyseo.com/search-engine-submission-list/"
    ">Search Engine Submission List</a>
    <a href=http://www.infyseo.com/ping-submission-sites/"
    ">Ping Submission Sites</a>
    <a href=http://www.infyseo.com/profile-creation-sites/"
    ">Profile Creation Sites</a>
    <a href=http://www.infyseo.com/question-and-answer-sites/"
    ">Question and Answer Sites</a>
    
    "
    


    Saturday, 10 November 2018 7:44 AM
  • I did see this happen intermittently on vm especially with low configuration of cpu and ram. Because of pool performance of vm. SQL start slow which is required by couple of hpc services.

    Please check your vm setting to see it matches the system requirement of hpc2016r2update2.  I think it's 8core and 16GB ram.

    Monday, 12 November 2018 1:36 AM
  • Hi Chenling,

    Thank you for your time on this and the details provided. I have verified the SQL Server (COMPUTECLUSTER) state is running. The SQL Server AGENT(COMPUTECLUSTER) has state Stopped. SQL Server Browser is running

    We recreated the SqlConnectionStringProvider.dll file using details downloaded from here: download.microsoft.com/download/B/D/B/.../HpcSqlConnectionStringPlugin.pdf

    When placed in the location above and the server restarted, some of the HPC services started. However the HPC monitoring Client Service, HPC Monitoring Server Service, HPC Web Service and HPC Session Service did not start

    We can now connect to the Server using the HPC JOB Manager from a workstation on the network.

    I will check the VM settings and update these if required.

    Thanks

    Andy

    Monday, 12 November 2018 11:05 AM