none
Azure nodes cannot access service configuration file

    Question

  • Environment: HPC Pack 2016 Update 1, with HeadNode and part of the cluster on premise. Trying to burst into Azure IaaS with SOA service.

    Issue is that the Azure node cannot access service configuration file on HN as those nodes are NOT on the domain. 

    Tried looking at hpcpack create/upload option - Apparently, there is no CCP_PACKAGE_ROOT environment variable on Azure node (i use the default 2016 pack image from marketplace)

    1) Why are the nodes unable to use the credentials set under 'Provide installation credentials' (or) use the 'Run As user' credentials set under SessionStartInfo to access on premise file shares/resources?

    2) What is the best way to have Azure IaaS nodes access on premise SOA configuration file folder *and* the related SOA DLL/resources?

    Thanks

    Monday, 17 September 2018 3:00 PM

All replies

  • Hi,

    Thanks for contacting us. What's the exact version of HPC Pack you are using? What's the task output of your SOA job?

    You can try HA configuration file feature for non-domain joined cluster node to get configuration files. In admin UI->Configuration->Services, click "Import High Avaliable Configuration File" on the right panel, and select the configuration file you want to use. It will then show in the services list and marked as "High Available". Then please retry your job.

    Thanks,
    Zihao

    Tuesday, 18 September 2018 3:12 AM
  • I am on HPC Pack 2016 Update 1 (with fixes) - v5.1.6114

    Tried you suggestion but its not working - I can import a file from a different folder, and it does not even show up in HPC CLuster manager on HeadNode. Although a new folder is created called 'HighAvailable' with the config file inside it.

    Have started the Clustermanager as Admininistration to no avail. Why is the entry not showing up under Configuration-> Services?

    I still went ahead and submitted a job and i get a config file not found at %CCP_SERVICEREGISTRATION_PATH%

    Tuesday, 18 September 2018 3:50 AM
  • 1. Are you adding a service configuration which its non-ha version is in your service list?

    2. Could you share us the entire task output of your soa job?

    • Proposed as answer by Satinderpal279 Thursday, 20 September 2018 8:23 AM
    • Unproposed as answer by Satinderpal279 Thursday, 20 September 2018 8:24 AM
    Tuesday, 18 September 2018 3:57 AM
  • Tried both ways - (re) adding the file that is already in the services list *and* secondly, deleting the file from serviceconfiguration folder and 'importing a high available file'. same result - neither way i can get the entry to show up in services list.

    Tuesday, 18 September 2018 4:02 AM
  • Hi,

    Could you provide some log files for us to trouble shoot your issue?

    1. HpcFrontendService log (%HPC_HN_LOGROOT%\HpcFrontend\HpcFrontend_*.bin)

    2. HpcClusterManager log (%CCP_LOGROOT_USR%\ClusterManager\HpcClusterManager_*.bin)

    3. The task output of your soa job

    Please collect more than 10 log files for each kind of log. Please send the logs to hpcpack@microsoft.com

    Thanks,
    Zihao

    Tuesday, 18 September 2018 4:24 AM
  • Is there a document as to how this 'High Available' feature is supposed to work? Some questions:

    1) Does that feature push (and make available) the SOA service configuration file *and* all the DLLs/folders reference in that config file to all the non-domain joined Azure nodes on the cluster?

    2) Do I have to 're-provision' the nodes or would they be available when a job is submitted to the headnode?

    Lastly, I sent an email with requested info.

    and here is task output (obvious because Azure nodes cannot access a share on head node as they are non-domain joined). This is when the configuration files a regular one (High Available = False): 

    (HPC2016-1000 is the Azure IaaS node

    <HeadNode> is Headnode)

    HpcSoa Information: 10011 : HpcServiceHost entry point is called.
    HpcSoa Information: 11 : [Session:2257] OnAzure = False
    HpcServiceHost32.exe Information: 0 : [GetCertificateValidationCallback] Bypass certificate CN validation.
    HpcServiceHost32.exe Information: 0 : [GetCertificateValidationCallback] Bypass certificate CN validation.
    HpcServiceHost32.exe Information: 0 : [ServiceRegistrationRepo] GetServiceRegistrationPath: Try get file BucatManualUpload.config
    [Main]: Cannot find the service registration file in the following directory(s): 
    \\<HeadNode>\HpcServiceRegistration
     Service cannot be activated. Redeploy the service.
    [Main]: Cannot find the service registration file: . Service cannot be activated. Redeploy the service.
    HpcSoa Information: 11 : [Session:2257] Open dummy service...
    HpcSoa Information: 1002 : Servicehost is started.
    HpcSoa Verbose: 10 : [Session:2257] [HpcServiceHost]: Task Id = 15988
    HpcSoa Verbose: 10 : [Session:2257] [HpcServiceHost]: Number of processors (service capability) = 1
    HpcSoa Information: 11 : [Session:2257] [HpcServiceHost]: Cancel Task Grace Period = 15000
    HpcSoa Information: 11 : [Session:2257] [HpcServiceHost]: First Allocated CoreId = 1
    HpcSoa Information: 11 : [Session:2257] [HpcServiceHost]: EnableMessageLevelPreemption = True
    HpcSoa Error: 13 : [Session:2257] [HpcServiceHost]: Cannot find service registration file.
    HpcSoa Verbose: 10 : [Session:2257] [HpcServiceHost]: WCF network prefix is not set.
    HpcSoa Verbose: 10 : [Session:2257] [HpcServiceHost]: ServiceOperationTimeout = 86400000, MaxMessageSize = 65536
    HpcSoa Information: 11 : [Session:2257] defaultBaseAddr of HostController is net.tcp://HPC2016-1000:9101/2257/15988
    HpcSoa Information: 11 : [Session:2257] Created ServiceHost for controller.
    HpcSoa Information: 11 : [Session:2257] Added endpoint to controller.
    HpcSoa Information: 11 : [Session:2257] Try to call _hostController.Open() below.
    HpcSoa Information: 11 : [Session:2257] Controller opened.
    HpcSoa Verbose: 10 : [Session:2257] [HpcServiceHost]: Dummy service opened on net.tcp://hpc2016-1000:9101/2257/15988/_defaultEndpoint


    • Edited by SRIRAM R Tuesday, 18 September 2018 7:58 PM
    Tuesday, 18 September 2018 2:40 PM
  • HI,

    Currently we have no documentation about ha configuration files. We are planning to add some.

    1. NO. It only sends configuration files through REST.

    2. Once you can see it in the ha service list, the configuration files are alienable to all nodes.

    From you log the task output, seems your installation of HPC Pack doesnot check the ha configuration storage. What's the result if you run

    cluscfg listenvs

    If there is no CCP_REGISTRATION_STORE in CCP_SERVICEREGISTRATION_PATH, please add it and retry, like

    CCP_SERVICEREGISTRATION_PATH=CCP_REGISTRATION_STORE;\\HEADNODE\HpcServiceRegistration

    Thanks,
    Zihao

    Wednesday, 19 September 2018 2:18 AM
  • Yep - had to add the CCP_REGISTRATION_PATH to the env variable. Thanks for that.

    Now, how do I give permissions to Azure nodes to access on premise folder share where the SOA dll is located (the dll location is specified through the HA service config file)?

    Wednesday, 19 September 2018 2:32 AM
  • Hi,

    You can specify a local folder to put the SOA dll in the service registration file, and distribute your binaries to all the compute nodes.

    For doing the distribution, you can leverage node preparation tasks, which can be defined in service registration file also, like below:

    <service assembly="C:\SampleSvc\SampleSvc.dll" includeExceptionDetailInFaults="true"

                maxConcurrentCalls="0" serviceInitializationTimeout="60000"

                enableMessageLevelPreemption="true" stdError="" maxMessageSize="65536"

                 prepareNodeCommandLine="<command line to download service binaries>"

                releaseNodeCommandLine ="<(optional) command line to remove service binaries>"

    Thanks,
    Zihao

    Wednesday, 19 September 2018 3:39 AM
  • The problem still would persist - Whatever is specified in the prepareNodeCommandLine will have to access the shared folder on head node that is on the Domain. Since the Azure nodes are NOT on the domain, authentication will fail as the prepareNodeCommandLine will be run from the Azure node -- I tried RDPing into Azure node and see if I can access any folder on HN, it always prompts for a windows security dialog to enter user credentials. 

    Wednesday, 19 September 2018 3:49 AM
  • Hi,

    You can instead put your binaries in an Azure Blob Storage and download it on the compute node in your node preparation tasks.

    Thanks,
    Zihao

    Wednesday, 19 September 2018 4:14 AM
  • 1) How will this setting work for on premise nodes (I have a hybrid cluster) - Is there a way then to configure such that  on premise nodes should NOT run the prepareNodeCommandLine to pull from Azure Blob Storage? 

    2) Any documentation/examples on setting the prepareNodeCommandLine (for Azure nodes only) to pull from Azure Blob storage?

    Wednesday, 19 September 2018 11:08 AM
  • Hi,

    As the prepareNodeCommandLine  configuration takes a general command line, you can use any way to download from a blob (e.g. powershell curl). If you don't want the on-premise nodes to download from azure blob, then you will need to distribute the binaries into the nodes' local folders. Once you have done that, you can check if the file exists in the prepareNodeCommandLine before initiate a download.

    Thanks,
    Zihao

    Thursday, 20 September 2018 1:43 AM
  • Any instructions on how to create Azure Blob storage *and* configure it for HPC Pack (certificates etc?). Everywhere I see hpcpack create and hpcpack upload but *how* exactly do I go about creating one and assign it to Azure nodes?

    My architecture is a hybrid one (with HN on premise) and plan to burst into Azure IaaS using http://download.microsoft.com/download/B/D/B/BDB8782A-FAAF-457D-AF3D-0B157FEEDF4C/Burst%20to%20Azure%20IaaS%20nodes%20in%20HPC%20Pack.pdf

    This document does not reference anything about SOA or Azure blob storage and hence this question....

    Thanks

    Thursday, 20 September 2018 4:03 AM
  • Hi,

    hpcpack is for PaaS nodes only. We plan to extend it for IaaS also, but the function is not available now.

    Using prepareNodeCommandLine  does not require any configuration of HPC Pack. You can create a blob storage, upload your files into it, and download them in prepareNodeCommandLine  using curl or azcopy.

    Thank,
    Zihao

    Thursday, 20 September 2018 5:33 AM
  • Issue still persists after marking the svc config file on HN as HA. HN is on premise. Task output is that configuration file not found
    Friday, 21 September 2018 1:09 AM
  • Please help to share the full task output.
    Friday, 21 September 2018 1:51 AM
  • Should the config file be made available in *both* serviceregistration folder and the highavailability subfolder?

    Here is the task output (there are  6 tasks all with same output):

    etryAsync>d__33`1.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at Microsoft.Hpc.RetryManager.<InvokeWithRetryAsync>d__33`1.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.HttpClientExtension.<GetHttpApiCallAsync>d__5.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.Rest.HpcRestClient.<GetHttpApiCallAsync>d__16`1.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.Rest.ServiceRegistrationRestClient.<GetMd5Async>d__14.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.Rest.ServiceRegistrationRestClient.<GetMd5Async>d__15.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.Rest.ServiceRegistrationRestClient.<ExportToTempFileAsync>d__22.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.Scheduler.Session.Internal.ServiceRegistrationRepo.GetServiceRegistrationPath(String filename)
    [Main]: Cannot find the service registration file in the following directory(s): 
    CCP_REGISTRATION_STORE;\\MYHEADNODE\HpcServiceRegistration
     Service cannot be activated. Redeploy the service.
    [Main]: Cannot find the service registration file: . Service cannot be activated. Redeploy the service.
    HpcSoa Information: 11 : [Session:2283] Open dummy service...
    HpcSoa Information: 1002 : Servicehost is started.
    HpcSoa Verbose: 10 : [Session:2283] [HpcServiceHost]: Task Id = 16137
    HpcSoa Verbose: 10 : [Session:2283] [HpcServiceHost]: Number of processors (service capability) = 1
    HpcSoa Information: 11 : [Session:2283] [HpcServiceHost]: Cancel Task Grace Period = 15000
    HpcSoa Information: 11 : [Session:2283] [HpcServiceHost]: First Allocated CoreId = 4
    HpcSoa Information: 11 : [Session:2283] [HpcServiceHost]: EnableMessageLevelPreemption = True
    HpcSoa Error: 13 : [Session:2283] [HpcServiceHost]: Cannot find service registration file.
    HpcSoa Verbose: 10 : [Session:2283] [HpcServiceHost]: WCF network prefix is not set.
    HpcSoa Verbose: 10 : [Session:2283] [HpcServiceHost]: ServiceOperationTimeout = 86400000, MaxMessageSize = 65536
    HpcSoa Information: 11 : [Session:2283] defaultBaseAddr of HostController is net.tcp://MYHEADNODE:9104/2283/16137
    HpcSoa Information: 11 : [Session:2283] Created ServiceHost for controller.
    HpcSoa Information: 11 : [Session:2283] Added endpoint to controller.
    HpcSoa Information: 11 : [Session:2283] Try to call _hostController.Open() below.
    HpcSoa Information: 11 : [Session:2283] Controller opened.
    HpcSoa Verbose: 10 : [Session:2283] [HpcServiceHost]: Dummy service opened on net.tcp://MYHEADNODE:9104/2283/16137/_defaultEndpoint
    ^C

    Friday, 21 September 2018 1:59 AM
  • Hi,

    The compute nodes failed to get registration file from HA store.

    As the result is truncated, could you please send the service host logs to us? You can find them in %CCP_LOGROOT_USR%SOA\HpcServiceHost\%CCP_JOBID%\%CCP_TASKINSTANCEID%\Host_*.bin

    In you case %CCP_JOBID% is 2283, and you can send all bin files in the folder to us.

    Thanks,
    zihao

    Friday, 21 September 2018 2:08 AM
  • So I have upgraded to 2016 Update 2 and still have this issue.

    What's weird is the prepareNodeCommandLine that is specified in the config file runs fine on (non-domain joined) IaaS node in a hybrid cluster, but the service dll never appears to execute. the tasks get canceled and new tasks get spun up to no avail. The SOA dll is a simple hello world kind of dll. Here is the output from a failed task (not sure if this helps any):

    n ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.HttpClientExtension.<GetHttpApiCallAsync>d__5.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.Rest.HpcRestClient.<GetHttpApiCallAsync>d__18.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.Rest.HpcRestClient.<GetHttpApiCallAsync>d__20`1.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.Rest.ServiceRegistrationRestClient.<GetMd5Async>d__15.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.Rest.ServiceRegistrationRestClient.<GetMd5Async>d__16.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.Rest.ServiceRegistrationRestClient.<ExportToTempFileAsync>d__23.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Microsoft.Hpc.Scheduler.Session.Internal.ServiceRegistrationRepo.GetServiceRegistrationPath(String filename)
    [Main]: Cannot find the service registration file in the following directory(s): 
    CCP_REGISTRATION_STORE;\\<HN>\HpcServiceRegistration
     Service cannot be activated. Redeploy the service.
    [Main]: Cannot find the service registration file: . Service cannot be activated. Redeploy the service.
    HpcSoa Information: 11 : [Session:2337] Open dummy service...
    HpcSoa Information: 1002 : Servicehost is started.
    HpcSoa Verbose: 10 : [Session:2337] [HpcServiceHost]: Task Id = 16425
    HpcSoa Verbose: 10 : [Session:2337] [HpcServiceHost]: Number of processors (service capability) = 1
    HpcSoa Information: 11 : [Session:2337] [HpcServiceHost]: Cancel Task Grace Period = 15000
    HpcSoa Information: 11 : [Session:2337] [HpcServiceHost]: First Allocated CoreId = 1
    HpcSoa Information: 11 : [Session:2337] [HpcServiceHost]: EnableMessageLevelPreemption = True
    HpcSoa Error: 13 : [Session:2337] [HpcServiceHost]: Cannot find service registration file.
    HpcSoa Verbose: 10 : [Session:2337] [HpcServiceHost]: WCF network prefix is not set.
    HpcSoa Verbose: 10 : [Session:2337] [HpcServiceHost]: ServiceOperationTimeout = 86400000, MaxMessageSize = 65536
    HpcSoa Information: 11 : [Session:2337] defaultBaseAddr of HostController is net.tcp://HPC2016-1016:9101/2337/16425
    HpcSoa Information: 11 : [Session:2337] Created ServiceHost for controller.
    HpcSoa Information: 11 : [Session:2337] Added endpoint to controller.
    HpcSoa Information: 11 : [Session:2337] Try to call _hostController.Open() below.
    HpcSoa Information: 11 : [Session:2337] Controller opened.
    HpcSoa Verbose: 10 : [Session:2337] [HpcServiceHost]: Dummy service opened on net.tcp://hpc2016-1016:9101/2337/16425/_defaultEndpoint
    ^C

    Tuesday, 9 October 2018 7:43 PM
  • Hi,

    This line shows the service registration file cannot be found:

    [Main]: Cannot find the service registration file in the following directory(s): 
    CCP_REGISTRATION_STORE;\\<HN>\HpcServiceRegistration
     Service cannot be activated. Redeploy the service.
    [Main]: Cannot find the service registration file: . Service cannot be activated. Redeploy the service.

    Could you send us the service host log files for investigation? Thanks.

    Wednesday, 10 October 2018 1:45 AM
  • I do not have the folder "%CCP_LOGROOT_USR%SOA\HpcServiceHost\%CCP_JOBID%\%CCP_TASKINSTANCEID%\Host_*.bin"

    as suggested by you in the earlier post on my HN.

    Wednesday, 10 October 2018 1:49 AM
  • Hi,

    The service host log need to be found on compute nodes.

    Wednesday, 10 October 2018 3:06 AM