none
SOA Service Loading Test fails on compute nodes, but succeeds on head node

    Frage

  • Recently provisioned a new HPC 2016 Update 1 cluster and am attempting to configure a few SOA jobs. The jobs only successfully run on the head node. Running the SOA Service Loading test indicates that the job only runs successfully from the head node. The system was created with a self-signed certificate, and the cert was installed in the Trusted Root Certificate Authority. 

    Editing the configuraiton files for the HpcBrokerWorker or HpcServiceHost services to enable tracing did not cause the trace files to get generated. 

    The network topology is all nodes on an Enterprise and Private network. I can successfully ping between the nodes on the private network. The only error message that I see is from the diagnostic is: 

    • Response status code does not indicate success: 401 (Unauthorized). 

    The test is being run as a Domain Admin, and the Users group already contains the NT AUTHORITY\Authenticated Users on both the head node, and the compute node. I have also enabled the Analytic and Debug events within Event Viewer, but nothing appears to show up which helps diagnose this issue.

    Mittwoch, 16. Mai 2018 21:05

Antworten

  • Hi,

    You are using domain-joined single head HPC Pack 2016 Update 1 cluster, and starting SOA jobs under cluster admin's identity, am I right?

    If yes, please check if your cert is installed in the LocalMachine\My store on your computer nodes. And please make sure this cert can be access by the user starting SOA session.

    There is also a workaround for this, see "SOA job stuck at 0% if started by a non-administrator HPC user" on the Known Issue page.

    Thanks,
    Zihao

    • Als Antwort markiert KB_apl Donnerstag, 17. Mai 2018 21:26
    Donnerstag, 17. Mai 2018 03:03

Alle Antworten

  • Hi,

    You are using domain-joined single head HPC Pack 2016 Update 1 cluster, and starting SOA jobs under cluster admin's identity, am I right?

    If yes, please check if your cert is installed in the LocalMachine\My store on your computer nodes. And please make sure this cert can be access by the user starting SOA session.

    There is also a workaround for this, see "SOA job stuck at 0% if started by a non-administrator HPC user" on the Known Issue page.

    Thanks,
    Zihao

    • Als Antwort markiert KB_apl Donnerstag, 17. Mai 2018 21:26
    Donnerstag, 17. Mai 2018 03:03
  • The cert was not in the location you specify (LocalMachine/Personal), but the fix provided in the link did provide the fix to resolve my issue. I'm not sure why that setting was not valid (nor did i correctly check to see what it was before I changed it). I assumed that it was not related to my issue since I was using an administrator account.

    Thank you once again Zihao.

    Donnerstag, 17. Mai 2018 21:26