none
Cluster Manager not finding network locations RRS feed

  • Question

  • Hi,

    I have a cluster of four nodes running Windows Server 2008 HPC Edition. I use them for running a huge program on a network location. It has been working OK but some time ago I had to migrate to another network server. That's when the problems began. I have a scheduled task that submits a job to the HPC Cluster Manager but the job fails instantly. When I click on the error, it gives this information:

    Error from node: CLSTR1:Exception 'Working directory '\\testnew.mvrsp.com\testing\2013\bin\cluster' does not exist' reported creating the task.

    If I copy the path and paste it in Windows Explorer from node CLSTR1 (and any other node indeed) it opens directly with no questions asked. The path to the previous server was almost exactly the same (\\test.mvrsp.com\testing\2013\bin\cluster) and it worked well.

    Any idea why HPC Cluster Manager cannot access the new path when it is accessible from Windows Explorer?

    Thanks!

    Thursday, June 20, 2013 7:13 AM

All replies

  • Are the effective credentials of the failing task valid and are _those_ credentials permitted to the desired share/path/directory/etc ?

    By default the RunAs credentials are inherited from the identity that creates/submits the job.  They can be overridden on the job before it is submitted.

    This applies to the failing task AND the task that creates the job containing the failing task (remember the inheritance rule).

    When you put the unc into Explorer you are confirming that _your_ credentials can see the share (you did not mention if you confirmed or needed write access).  This test yields helpful information but is not authoritative for the failing task.

    d



    • Edited by DarylMsft Friday, June 21, 2013 4:59 PM clarity
    Thursday, June 20, 2013 7:05 PM
  • I don't think that I fully understand what you mean. I get the same error if I submit the job manually with the 'job submit' command from command line. How can I verify what credentials I submit the job with and what credentials I run the job with?
    Monday, June 24, 2013 3:36 PM
  • try "job submit whoami"

    or "job submit set" and look at the redirected output for "username".

    Both of these approaches will disclose the identity of the security context of the task.

    d

    Tuesday, June 25, 2013 7:57 PM
  • Hi, you have the right. I am submitting the job as CLSTR1\tester while the network location is accessible for COM\tester. I cannot switch the domain of the cluster. Is there a way to make CLSTR1\tester access the locations of COM\tester by tweaking the tasks or applying some other trick in Windows?
    Friday, June 28, 2013 6:13 AM
  • Possibly.  Every job has RunAs creds... and _those_  are used when creating the process for each task.

    Please review "job submit /?" and look for the parameters: /user and /password.

    But... since the compute nodes are probably in the domain of the headnode, your compute nodes may refuse (to grant a logon token for) creds that would work for your fileshare.   Meaning RunAs would not provide a solution.

    If you cannot setup trust between the domains you may have to (write code to) give the secrets to the task .exe so it can provide acceptable creds to the fileshare as it attempts access.  Job/Task custom properties and/or environment variables might help with that approach.

    There are other tricks of various complexities.   But start with establishing trust between the domains...

    d

    Saturday, June 29, 2013 1:42 AM