none
Unable to provision workstation node/unmanaged server node. RRS feed

  • Question

  • Hi,

    I've set up a HPC head node and I'm trying to add unmanaged server nodes as well as workstation nodes.

    Provisioning fails - in the HPC Manager I see the following:

    Could not contact node 'HPCCLUSTER' to perform change. Could not connect to net.tcp://130.83.248.133:6730/IClusterNodeInternal. 
    On the node, in the Event viewer I find the following errors:
    Application: HpcManagement.exe
    Framework Version: v4.0.30319
    Description: The process was terminated due to an unhandled exception.
    Exception Info: System.Runtime.InteropServices.SEHException
       at System.Runtime.InteropServices.CustomMarshalers.EnumerableViewOfDispatch.GetEnumerator()
       at System.Collections.IEnumerable.GetEnumerator()
       at Microsoft.ComputeCluster.Management.NetworkModel.ScmContainsFirewallService.DiscoverRuleGroupState(Microsoft.SystemDefinitionModel.Manager.ISession, Microsoft.ComputeCluster.Management.NetworkModel.FirewallService)
       at Microsoft.ComputeCluster.Management.NetworkModel.ScmContainsFirewallService.ValidateFirewall(Microsoft.SystemDefinitionModel.Manager.ISession)
       at Microsoft.ComputeCluster.Management.ComputerModel.ServiceControlManager.Validate(Microsoft.SystemDefinitionModel.Manager.ISession)
       at Microsoft.ComputeCluster.Management.ClusterModel.ClusterNode.ValidateOs(Microsoft.SystemDefinitionModel.Manager.ISession)
       at Microsoft.ComputeCluster.Management.ClusterModel.ClusterNode.Validate(Microsoft.SystemDefinitionModel.Manager.ISession)
       at Microsoft.SystemDefinitionModel.SystemManager.DiscoverChange.OnExecute(Microsoft.SystemDefinitionModel.Manager.ISession)
       at Microsoft.SystemDefinitionModel.Change.Transition(Microsoft.SystemDefinitionModel.Manager.ChangeTransition, Microsoft.SystemDefinitionModel.SdmErrorCollection)
       at Microsoft.SystemDefinitionModel.Change.Execute(Microsoft.SystemDefinitionModel.SdmErrorCollection)
       at Microsoft.SystemDefinitionModel.ModelUpdate.CommitChanges(Microsoft.SystemDefinitionModel.SdmErrorCollection)
       at Microsoft.SystemDefinitionModel.ModelUpdate.Commit(Microsoft.SystemDefinitionModel.SdmErrorCollection)
       at Microsoft.SystemDefinitionModel.ChangeAction.CommitNestedChange(Microsoft.SystemDefinitionModel.Manager.ISession, Microsoft.SystemDefinitionModel.ModelUpdate)
       at Microsoft.SystemDefinitionModel.ChangeAction.Execute(Microsoft.SystemDefinitionModel.Session)
       at Microsoft.SystemDefinitionModel.ExecutionEngine+ActionWrapper.Execute(Microsoft.SystemDefinitionModel.Session)
       at Microsoft.SystemDefinitionModel.ExecutionEngine.ExecuteAction(System.Collections.Generic.Queue`1<ActionWrapper>, Microsoft.SystemDefinitionModel.Session)
       at Microsoft.SystemDefinitionModel.ExecutionEngine.InternalExecuteChange(Microsoft.SystemDefinitionModel.Change, Microsoft.SystemDefinitionModel.SdmErrorCollection, Microsoft.SystemDefinitionModel.IExecutionStatus)
       at Microsoft.SystemDefinitionModel.ExecutionEngine.ExecuteChange(Microsoft.SystemDefinitionModel.Change, Microsoft.SystemDefinitionModel.SdmErrorCollection, Microsoft.SystemDefinitionModel.IExecutionStatus)
       at Microsoft.SystemDefinitionModel.ExecutionEngine.ExecuteChange(Microsoft.SystemDefinitionModel.Change, Microsoft.SystemDefinitionModel.SdmErrorCollection, Boolean, Microsoft.SystemDefinitionModel.IExecutionStatus)
       at Microsoft.ComputeCluster.Management.ExecutionEngineWithEvents.ExecuteChange(Microsoft.SystemDefinitionModel.Change, Microsoft.SystemDefinitionModel.SdmErrorCollection, Boolean, Microsoft.SystemDefinitionModel.IExecutionStatus)
       at Microsoft.SystemDefinitionModel.ModelUpdate.RunEngine(Microsoft.SystemDefinitionModel.SdmErrorCollection)
       at Microsoft.SystemDefinitionModel.ModelUpdate.CommitChanges(Microsoft.SystemDefinitionModel.SdmErrorCollection)
       at Microsoft.SystemDefinitionModel.ModelUpdate.Commit(Microsoft.SystemDefinitionModel.SdmErrorCollection)
       at Microsoft.SystemDefinitionModel.UpdateWorker.Execute()
       at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
       at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
       at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
       at System.Threading.ThreadPoolWorkQueue.Dispatch()
    
    
    Anyone knows a solution,
    Thanks in advance!

    Wednesday, April 29, 2020 5:22 AM

All replies

  • What is the HPC Pack version are you using?

    In which node you saw the exception? 

    Saturday, May 9, 2020 9:41 AM
  • HPC Pack 2016, Update 3;
    i downloaded 'HPCPack2016Update3-Full-v6435'

    The exception occurs on the Compute Node;

    Saturday, May 9, 2020 4:28 PM

  • Had you tried to reboot the compute node?

    Could you try to:

    1. Re-install amd64\vcredist_x64.exe and i386\vcredist_x86.exe?

    2. Check whether "Windows Defender Firewall" service is running? If not please start it.

    Btw, for a fresh new installation of HPC Pack 2016 Update 3, we recommend you to directly install HPCPack2016Update3-Full-Refresh-v6450.zip which includes a bunch of fixes and improvements on top of 6435.

    Monday, May 11, 2020 5:38 AM
  • Hi,

    I reinstalled/rebootet several times, I also tried any Combination of Firewall Service on/off; I also reinstalled the VC redistributables; I tried the Full-Refresh-v6450;

    The error persists; on the (supposed) compute note, in the Event viewer I can see 3 errors for every provisioning attempt.
    The first one it the .NET exception listet above, something about Firewall checking.
    The next two seem to be follow-ups:

    Faulting application name: HpcManagement.exe, version: 5.3.6450.0, time stamp: 0x5e42c636
    Faulting module name: FirewallAPI.dll, version: 6.3.9600.18895, time stamp: 0x5a4b0a8d
    Exception code: 0xc000001d
    Fault offset: 0x000000000002e218
    Faulting process id: 0xf8c
    Faulting application start time: 0x01d6275ddb967987
    Faulting application path: C:\Program Files\Microsoft HPC Pack 2016\Bin\HpcManagement.exe
    Faulting module path: C:\Windows\System32\FirewallAPI.dll
    Report Id: 34b081ae-9351-11ea-8107-0025905cf220
    Faulting package full name: 
    Faulting package-relative application ID: 

    And

    Windows cannot access the file  for one of the following reasons: there is a problem with the network connection, the disk that the file is stored on, or the storage drivers installed on this computer; or the disk is missing. Windows closed the program HPC Management Service because of this error.
    
    Program: HPC Management Service
    File: 
    
    The error value is listed in the Additional Data section.
    User Action
    1. Open the file again. This situation might be a temporary problem that corrects itself when the program runs again.
    2. If the file still cannot be accessed and
    	- It is on the network, your network administrator should verify that there is not a problem with the network and that the server can be contacted.
    	- It is on a removable disk, for example, a floppy disk or CD-ROM, verify that the disk is fully inserted into the computer.
    3. Check and repair the file system by running CHKDSK. To run CHKDSK, click Start, click Run, type CMD, and then click OK. At the command prompt, type CHKDSK /F, and then press ENTER.
    4. If the problem persists, restore the file from a backup copy.
    5. Determine whether other files on the same disk can be opened. If not, the disk might be damaged. If it is a hard disk, contact your administrator or computer hardware vendor for further assistance.
    
    Additional Data
    Error value: 00000000
    Disk type: 0

    Monday, May 11, 2020 6:42 AM
  • Seems the service failed to access a system file.

    What is your OS version? Had you tried in another machine?

    Tuesday, May 12, 2020 7:04 AM
  • The node is running Windows Server 2012;

    We also have another node, running Windows Server 2016 - the same problem;

    These problems started with the latest windows update; We had a working HPC Pack 2012 R2 Cluster - after the latest stack of updates (KB4532940, KB4550961, KB4540725) it refused to work again after reboot;

    After several unsucessful attempts to get HPC 2012 working, I decided to upgrade to HPC 2016.

    Wednesday, May 13, 2020 11:29 AM