none
SOA error on hybrid cluster w/non-domain joined IaaS nodes

    Question

  • I get the following error when a SOA is deployed to the cluster - The node <NodeName> is required by one of the tasks in the job, and may not be added to the exclusion list.

    This probably is happening due to EndPointNotFoundRetryPeriod/maxExcludedNodes setting of Broker node. How do I clear the excluded list nodes list on head node? the get-hpcjob cmdlet appears to only clear the list for the job, but not clusterwide.

    This is happening despite the node health and reachability shows Online/OK.

    Please advise.

    Tuesday, 9 October 2018 4:05 AM

All replies

  • Upon investigating, it seems to be happening when the nodepreparecommandline fails. in my case, i get  an exit code of 267 and the session/hpc puts the nodes instantly into exclusion list.

    this is happening when i am using azcopy command to copy files from blob storage container to IaaS noded as preparnodecommandline. What's weird is i can RDP into IaaS node and run the same azcopy command and it works just fine. apparently, preparenodecommandline cannot create a directory if it does not exist on the IaaS node using azcopy command. here is the command that i used:

    prepareNodeCommandLine="AzCopy /Source:https://????.blob.core.windows.net/testcontainer/testsoa /Dest:C:\Users\Public\TestHPCService\TestSOA /SourceKey:<key> /S /MT /XO /Y"

    Any pointers why this may be happening

    I am on 2016 Update 2.

    Tuesday, 9 October 2018 7:32 PM