locked
How to submit a MPI job ro CCS? RRS feed

  • Question

  • Hi, there,

     

       It is my first time to use windows HPC to run mpi jobs. I encountered following problem. When I typed in the command

     

      job submit /askednodes:Node1,Node2 /numprocessors:3 mpiexec MyApp.exe

     

     The job failed without any failure message. If I typed in

     

    job submit /askednodes:Node1,Node2 /numprocessors:3 /stdout:\\headnode\fileshare\out.txt mpiexec MyApp.exe

     

    It returned " Failed to create standard output file ..... err code 5'. I have no idea how to solve these problems.

     

    P.S. If I am going to run the code on serveral nodes, do i need to store the code on each node?

     

    Thank you very much!

     

    Wei

    Friday, April 25, 2008 7:17 AM

Answers

  • Wei,

    What version of CCS are you running (Help -> About)?

     

    There are a few issues here:

    • Jobs shouldn't ever fail without a failure message, so you may have uncovered a bug.  Can you send us the output of running "job view #####" on your failed job?
    • Next, try running "task view ####.1" to view the details of the actual MPI task.  This may contain the failure message/details that you need.
    • " If I am going to run the code on serveral nodes, do i need to store the code on each node?"  Every node that runs the job will need to have access to "MyApp.exe" . . . this means it needs to either be in the PATH or in the specified Working Directory for the task.  This application could live on a share, or locally.

    Some examples of the last point:

    Code Snippet

    job submit /workdir:\\headnode\fileshare mpiexec myapp.exe

    would work if \\headnode\fileshre contained myapp.exe, or if myapp.exe was in the PATH on all of the compute nodes.

     

    Alternatively, give a full path to yoru application:

    Code Snippet

    job submit mpiexec "C:\Program Files\MyApp\MyApp.exe"

    would also work, provided your application was installed on each machine that your job ran on.

     

    Thanks,

    Josh

    Friday, April 25, 2008 5:38 PM
    Moderator
  •  

    Yes; due to the security mechanisms we have, MS-MPI jobs can only execute in the context of a job (this also prevents users from running MPI jobs on the cluster when they haven't gone through the queue).

     

    You can use our API's to submit jobs to the cluster; doing that in your C++ code should give you the behavior that you want.  Are you using CCS 2003 or HPC Server 2008?  The SDK for the former is available from http://download.microsoft.com, and the SDK for the latter is available from http://connect.microsoft.com as part of our Beta program.

     

    Thanks,
    Josh

     

     

    Tuesday, April 29, 2008 5:11 PM
    Moderator
  • Some helpful links on how to submit jobs to the cluster using the C++ APIs can be found here:

    http://msdn2.microsoft.com/en-us/library/bb540429(VS.85).aspx

     

     

     

    Wednesday, May 7, 2008 4:33 PM

All replies

  • Wei,

    What version of CCS are you running (Help -> About)?

     

    There are a few issues here:

    • Jobs shouldn't ever fail without a failure message, so you may have uncovered a bug.  Can you send us the output of running "job view #####" on your failed job?
    • Next, try running "task view ####.1" to view the details of the actual MPI task.  This may contain the failure message/details that you need.
    • " If I am going to run the code on serveral nodes, do i need to store the code on each node?"  Every node that runs the job will need to have access to "MyApp.exe" . . . this means it needs to either be in the PATH or in the specified Working Directory for the task.  This application could live on a share, or locally.

    Some examples of the last point:

    Code Snippet

    job submit /workdir:\\headnode\fileshare mpiexec myapp.exe

    would work if \\headnode\fileshre contained myapp.exe, or if myapp.exe was in the PATH on all of the compute nodes.

     

    Alternatively, give a full path to yoru application:

    Code Snippet

    job submit mpiexec "C:\Program Files\MyApp\MyApp.exe"

    would also work, provided your application was installed on each machine that your job ran on.

     

    Thanks,

    Josh

    Friday, April 25, 2008 5:38 PM
    Moderator
  • Hi, Josh,

     

      Thanks for your answer. I think I solved the problem now. I have another question about CCS. I am developing a program to access the windows cluster remotely, transfer the data to the head node, then call a solver to solve a problem on windows clusters with parallel computation. I am wondering how to call the solver in my program. Do you happen to konw any web sites or material for this implementation? Thanks again.

     

    Wei

    Monday, April 28, 2008 8:41 AM
  • It depends what you mean by "call the solver."  If you're using the Job/Task mechanisms built-in to CCS, you would simply provide the command-line for your solver as your task's command line.

     

    If your "solver" is more of a service (it runs and requests come in) then it may benefit from the new Service-Oriented Application work we're doing in v2 (go to http://connect.microsoft.com, log into our Beta site, and check the downloads section for some documentation).

     

    If neither of those sounds like your case, I think more details will be needed for us to help you.

     

    Thanks,
    Josh

    Monday, April 28, 2008 6:43 PM
    Moderator
  • Thank you Josh. I am woking on a C++ project like this. On the client side, users provide all the input data for a problem, then I pack the data and send to a windows cluster. On the server side, when the cluster gets the data, it invoke the problem sovler which is using paralle computing with MPI. As the problem is solved, the cluster sends the solution back the clinet.

     

    Could you tell me how to start a MPI in my C++ code? it seems that MPI cannot run on a host that has not been requested by and reserved for the job.

     

    Thanks again.

     

    Wei

     

    Tuesday, April 29, 2008 3:18 AM
  •  

    Yes; due to the security mechanisms we have, MS-MPI jobs can only execute in the context of a job (this also prevents users from running MPI jobs on the cluster when they haven't gone through the queue).

     

    You can use our API's to submit jobs to the cluster; doing that in your C++ code should give you the behavior that you want.  Are you using CCS 2003 or HPC Server 2008?  The SDK for the former is available from http://download.microsoft.com, and the SDK for the latter is available from http://connect.microsoft.com as part of our Beta program.

     

    Thanks,
    Josh

     

     

    Tuesday, April 29, 2008 5:11 PM
    Moderator
  • Some helpful links on how to submit jobs to the cluster using the C++ APIs can be found here:

    http://msdn2.microsoft.com/en-us/library/bb540429(VS.85).aspx

     

     

     

    Wednesday, May 7, 2008 4:33 PM