Asked by:
One of the Node in HPC auto elasticity is unable to turn on

General discussion
-
Dear Team,
We have almost 25 CN's under auto elasticity, out of which we have one node which has an issue. when I am going through the Azure operation logs it is giving the warning as "Don't know how to create node <machinename> on Azure.
I am unable to submit the jobs and unable to bring it online.
Let me know if you need any details.Monday, June 18, 2018 6:33 AM
All replies
-
Hi,
Could you tell us the HPC Pack version you're using?
Then tell us how you added the 25 nodes into HPC Pack. And I suppose you're following this https://docs.microsoft.com/en-us/azure/virtual-machines/windows/classic/hpcpack-cluster-node-autogrowshrink to enable the auto grow shrink of azure nodes?
Qiufang Shi
Tuesday, June 19, 2018 3:07 AM -
Hi Qiufang,
Version of HPC is HPC 2016 update 1We have added 25 nodes to HPC using installer and selecting them as Join as CN to the existing cluster.
Yes, i have followed the same document from MS to enable elasticity.
Do let me know if you need any details.
Tuesday, June 19, 2018 3:22 AM -
Hi,
Could you run the following PowerShell Command on the head node and share the result? Thanks
Add-PsSnapin Microsoft.Hpc
Get-HpcNode -Name <nodename> | fl
Tuesday, June 19, 2018 7:21 AM -
NetBiosName : machinename
InstanceName : domainname\machinename
DomainName : domainname and the path
FullyQualifiedDnsName : name of the machine
ManagementIpAddress :
PxeBootMac : 000D3A004228
NodeSID : S-1-5-21-2943997608-406933975-3304974251-12109
MachineGuid : fa00a2e0-d812-420e-a8a5-5a7aeff5d65b
InstanceId : fb28a56d-d80f-47b7-9aed-077aefca8495
Location
Description :
NodeState : NotDeployed
NodeHealth : Unreachable
HealthState : Unapproved
ServiceHealth : Ok
Provisioned : False
Processors : {Name="Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz", MaxClockSpeed="2295", L2Cache="0"}
ProcessorCores : 16
Sockets : 1
Memory : 114688
OSVersion : {PlatformId="2", SuiteMask="0", ProductType="Server", OsVersion="6.2.9200", ServicePackVersion="0.0", OsVersionDescription="Microsoft Windows NT 6.2.9200.0", OsArchitecture="AMD64"}
CcpVersion : 5.1.6086.0
CcpInstallationPath : C:\Program Files\Microsoft HPC Pack 2016\Bin
Version : 5.1.6086.0
Template : Default ComputeNode Template
Groups : ComputeNodes,AzureIaaSNodes,BatchJob
ProductKey :
NodeRole : ComputeNode
IsHeadNode : False
SubscribedCores :
SubscribedSockets :
Affinity :
AzureBatchComputeNodes :
AzureInstanceSize :
GPUs :
Networks : {Enterprise}
Tuesday, June 19, 2018 1:50 PM -
Hi,
Seem the node ran into issue for some reason, the "Provisioned" property is set to False by mistake. You may have to remove the node from cluster manager, and then start the VM manually from Azure portal, then it shall be back.
Wednesday, June 20, 2018 2:42 AM -
Hi,
I can do removing and readding the machine to cluster, but i want to know the reason why this is happening because we have seen this issue multiple times till now.
Please let me know why this issue is happening.
Happy to share the details, if you want any.Wednesday, June 20, 2018 4:44 AM -
Hi,
The reason is that the starting/stopping operation to VM failed due to some reason(for example, short of resource in Azure side), in some race condition, the property value of "Provisioned" will be set to False by mistake, it is a known issue, we will fix it in HPC Pack 2016 Update 2. You can share the latest 5 HpcManagement_AA_xxxxx.bin files in C:\Program Files\Microsoft HPC Pack 2016\Data\LogFiles\Management to us(hpcpack@microsoft.com) and the node name in issue, and when this issue happened, so that we can double check whether this is the known issue.
Thanks
Wednesday, June 20, 2018 6:54 AM -
Hi,
We are waiting for the HPC Pack 2 as this issue is constantly worrying our using of HPC.
This issue happened from 14/6/2018 3:25:07 PM.
I have shared the logs files over email to you.
Awaiting for your reply.Wednesday, June 20, 2018 7:10 AM