locked
One of the Node in HPC auto elasticity is unable to turn on RRS feed

  • General discussion

  • Dear Team,

    We have almost 25 CN's under auto elasticity, out of which we have one node which has an issue. when I am going through the Azure operation logs it is giving the warning as "Don't know how to create node <machinename> on Azure.

    I am unable to submit the jobs and unable to bring it online.

    Let me know if you need any details.
    Monday, June 18, 2018 6:33 AM

All replies

  • Hi,

      Could you tell us the HPC Pack version you're using?

      Then tell us how you added the 25 nodes into HPC Pack. And I suppose you're following this https://docs.microsoft.com/en-us/azure/virtual-machines/windows/classic/hpcpack-cluster-node-autogrowshrink to enable the auto grow shrink of azure nodes?


    Qiufang Shi

    Tuesday, June 19, 2018 3:07 AM
  • Hi Qiufang,

    Version of HPC is HPC 2016 update 1

    We have added 25 nodes to HPC using installer and selecting them as Join as CN to the existing cluster.

    Yes, i have followed the same document from MS to enable elasticity.

    Do let me know if you need any details.

    Tuesday, June 19, 2018 3:22 AM
  • Hi, 

    Could you run the following PowerShell Command on the head node and share the result? Thanks

    Add-PsSnapin Microsoft.Hpc

    Get-HpcNode -Name <nodename> | fl


    Tuesday, June 19, 2018 7:21 AM


  • NetBiosName            : machinename
    InstanceName           : domainname\machinename
    DomainName             : domainname and the path
    FullyQualifiedDnsName  : name of the machine
    ManagementIpAddress    : 
    PxeBootMac             : 000D3A004228
    NodeSID                : S-1-5-21-2943997608-406933975-3304974251-12109
    MachineGuid            : fa00a2e0-d812-420e-a8a5-5a7aeff5d65b
    InstanceId             : fb28a56d-d80f-47b7-9aed-077aefca8495
    Location               
    Description            : 
    NodeState              : NotDeployed
    NodeHealth             : Unreachable
    HealthState            : Unapproved
    ServiceHealth          : Ok
    Provisioned            : False
    Processors             : {Name="Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz", MaxClockSpeed="2295", L2Cache="0"}
    ProcessorCores         : 16
    Sockets                : 1
    Memory                 : 114688
    OSVersion              : {PlatformId="2", SuiteMask="0", ProductType="Server", OsVersion="6.2.9200", ServicePackVersion="0.0", OsVersionDescription="Microsoft Windows NT 6.2.9200.0", OsArchitecture="AMD64"}
    CcpVersion             : 5.1.6086.0
    CcpInstallationPath    : C:\Program Files\Microsoft HPC Pack 2016\Bin
    Version                : 5.1.6086.0
    Template               : Default ComputeNode Template
    Groups                 : ComputeNodes,AzureIaaSNodes,BatchJob
    ProductKey             : 
    NodeRole               : ComputeNode
    IsHeadNode             : False
    SubscribedCores        : 
    SubscribedSockets      : 
    Affinity               : 
    AzureBatchComputeNodes : 
    AzureInstanceSize      : 
    GPUs                   : 
    Networks               : {Enterprise}


    Tuesday, June 19, 2018 1:50 PM
  • Hi,

    Seem the node ran into issue for some reason, the "Provisioned" property is set to False by mistake. You may have to remove the node from cluster manager, and then start the VM manually from Azure portal, then it shall be back.

    Wednesday, June 20, 2018 2:42 AM
  • Hi,

    I can do removing and readding the machine to cluster, but i want to know the reason why this is happening because we have seen this issue multiple times till now.

    Please let me know why this issue is happening.

    Happy to share the details, if you want any.
    Wednesday, June 20, 2018 4:44 AM
  • Hi,

    The reason is that the starting/stopping operation to VM failed due to some reason(for example, short of resource in Azure side), in some race condition, the property value of "Provisioned" will be set to False by mistake, it is a known issue, we will fix it in HPC Pack 2016 Update 2. You can share the latest 5 HpcManagement_AA_xxxxx.bin files in C:\Program Files\Microsoft HPC Pack 2016\Data\LogFiles\Management to us(hpcpack@microsoft.com) and the node name in issue, and when this issue happened, so that we can double check whether this is the known issue.

    Thanks

    Wednesday, June 20, 2018 6:54 AM
  • Hi,

    We are waiting for the HPC Pack 2 as this issue is constantly worrying our using of HPC.

    This issue happened from 14/6/2018 3:25:07 PM.

    I have shared the logs files over email to you.

    Awaiting for your reply.
    Wednesday, June 20, 2018 7:10 AM