none
OpenMPI integration with Job Manager RRS feed

  • Question

  • I wish to use an app called migrate-n (http://popgen.sc.fsu.edu/Migrate/Migrate-n.html) which is distributed as a Windows binary built against the OpenMPI library. I am running HPC Server 2008 R2.

    I installed OpenMPI (1.5.4 win32 release) on a single compute node and can run directly on the node with the command:

    mpirun --hostfile hostfile.txt -np 8  C:\sw\migrate-3.2.15\migrate-n-mpi.exe  parmfile –nomenu

    (note: i have set the OpenMPI bin folder first in the PATH so as to not pick up MS MPI)

     

    My question is how do you schedule a OpenMPI job using the Windows HPC Job Manager?

    Do I need OpenMPI installed on the head node or just the compute nodes?

    Will 32 bit OpenMPI integrate with the Windows Job Manager, e.g., no need to specify a hostfile?

     

    My initial test gives orte errors when submitting the command above to the Cluster via the Job Manager.

    [QUB-HPC-CN-18:01720] [[33112,0],0] ORTE_ERROR_LOG: Error in file ..\..\..\openmpi-1.5.4\orte\mca\ras\base\ras_base_allocate.c at line 147

    [QUB-HPC-CN-18:01720] [[33112,0],0] ORTE_ERROR_LOG: Error in file ..\..\..\openmpi-1.5.4\orte\mca\plm\base\plm_base_launch_support.c at line 99

    [QUB-HPC-CN-18:01720] [[33112,0],0] ORTE_ERROR_LOG: Error in file ..\..\..\openmpi-1.5.4\orte\mca\plm\ccp\plm_ccp_module.c at line 186

     

    Thanks

    Wednesday, August 24, 2011 1:57 PM

All replies

  • Hello Meelbeg,

    You should be able to run OpenMPI using MS job scheduler. About the setup of OpenMPI across cluster, I believe you need install it over each compute node. But I suggest you refer to OpenMPI documentations.

    What command do you use to submit the job?

    Thanks,

    James

    Monday, October 10, 2011 11:10 PM
  • Hi James,

    I had pretty much given up on this but have taken a second look.

    The orte errors can be removed by adding the mca "orte_ccp_headnode" tuning argument. Here is an example of my job's task command:

    "C:\Program Files (x86)\OpenMPI_v1.5.4-win32\bin\mpirun" -mca orte_ccp_headnode QUB-HPC-HN-A   -np 4  \\w.x.y.z\home$\vpurnell\test\migrate-n-mpi.exe  parmfile -nomenu

     

    The job goes into the "running" state however there is a prompt for a username and password (I see this in the stdout) and then the program just sits doing nothing. I guess this is a OpenMPI / credential issue I am now facing.

     

    Thanks,

    Meelbeg


    • Edited by Meelbeg Thursday, October 20, 2011 3:35 PM
    Thursday, October 20, 2011 3:33 PM