Resources for IT Professionals >
Forums Home
>
Windows HPC (High Performance Computing) Forums
>
Windows HPC Server Job Submission and Scheduling
>
Immediate Job Failure with HPC Job Manager
Immediate Job Failure with HPC Job Manager
- I'm using Windows HPC Pack 2008 with the cluster running HPC Server 2008.
When I try to submit jobs using HPC Job Manager 2008, I get immediate failures with no reason given for my failures.
I have tried with both a parametric non-MPI task and a normal MPI task to no avail. The problem I think I am having (but then again, I have no clue), is that I am having trouble understanding what the working directory represents. I read that this directory needs a Universal Naming Convention which makes me believe that the directory I give it needs to be shared with the cluster. All I have been puting in the working directory before is the path to where my executables are, which may or may not be correct. Once again, I dont know if this is the trouble I am having, this is just my guess.
Any help is appreciated,
Thanks
Answers
- AJ,
If you're jobs are failing, you shoudl be able to see an error message. Try double-clicking the failed job in the UI and looking at the "Results" page.
Your Working Directory is the path where your job will be started. One way to think of it is that on the compute node where your jobs runs, the system will do "cd <your working directory>" and then "cmd.exe /c <your command line>". So the working directory path needs to be something that would be acccessible from any compute node where your job will run.
Some examples woudl be:
C:\Program Files\MyApp\ - This would use the local directory on each compute node, and thereby assumes that your command line, input, and output file paths are available relative to this path on all the machines (i.e., this would be a good choice if your application is locally installed on every machine in the cluster)
\\someserver\someshare\ - This would connect to a share, and is an excellent choice if your applications or data files are stored on a file server
Working Directory is always optional . . . by default the system will use %USERPROFILE% (C:\Users\Username). This is great if your input and output files are fully paths provided, or if your applicatin is in the PATH on each machine.
Thanks,
Josh
-Josh- Marked As Answer byAJ Fret Tuesday, June 02, 2009 5:40 PM
- Proposed As Answer byJosh BarnardMSFT, OwnerMonday, June 01, 2009 8:54 PM
- Unproposed As Answer byAJ Fret Tuesday, June 02, 2009 5:40 PM
All Replies
- AJ,
If you're jobs are failing, you shoudl be able to see an error message. Try double-clicking the failed job in the UI and looking at the "Results" page.
Your Working Directory is the path where your job will be started. One way to think of it is that on the compute node where your jobs runs, the system will do "cd <your working directory>" and then "cmd.exe /c <your command line>". So the working directory path needs to be something that would be acccessible from any compute node where your job will run.
Some examples woudl be:
C:\Program Files\MyApp\ - This would use the local directory on each compute node, and thereby assumes that your command line, input, and output file paths are available relative to this path on all the machines (i.e., this would be a good choice if your application is locally installed on every machine in the cluster)
\\someserver\someshare\ - This would connect to a share, and is an excellent choice if your applications or data files are stored on a file server
Working Directory is always optional . . . by default the system will use %USERPROFILE% (C:\Users\Username). This is great if your input and output files are fully paths provided, or if your applicatin is in the PATH on each machine.
Thanks,
Josh
-Josh- Marked As Answer byAJ Fret Tuesday, June 02, 2009 5:40 PM
- Proposed As Answer byJosh BarnardMSFT, OwnerMonday, June 01, 2009 8:54 PM
- Unproposed As Answer byAJ Fret Tuesday, June 02, 2009 5:40 PM
- Alright,
You helped me fix my previous problem, but I have another that is probably along the same lines. I'm currently getting the error...
"Error (14001) The application has failed to start because its side-by-side configuration is incorrect. Please see the application event log for more detail."
After some digging, this error could occur when I'm not using the release version of my code and the required dlls are not tied to it. (Once again, I don't know if this is the problem).
If this is the problem, how do I go about fixing it? and if it is not, any other help is appreciated.
Thanks again,
AJ

