how to select cores from nodes to run job on multiple nodes. Hi all,<br/> <br/>     I have 2 nodes in my cluster with 4 core on each node.<br/> <br/>     I have one <strong>exe </strong> file called <strong>sleep.exe. </strong> I submitted job with  <strong>Job submit /numnodes:2 mpiexec -cores 2 sleep.exe</strong> then it<strong> </strong> was open 2 sleep.exe processes on each node. <br/> <br/> <br/>     And I have a 4 core Ansys CFX job, and I want to run this job on 2 core from first node and other 2 core from second node. <br/> <br/>     I have tried  with <strong>job submit /numnodes:2 /workdir:&lt;working directory path&gt; /stdout:out.log /stderr:error.log mpiexec -cores 2 cfx5solve.exe -v -def &lt;.def file&gt; -start-method MSMPI -part 4. </strong> Then the job got failed and generated below error information in error.log file<br/> <br/> <strong><br/> &quot;An error has occurred in cfx5solve:<br/> <br/> Error reported by IO module: readIntFmtData: (fgets failed) syserr:: No<br/> error<br/> <br/> An error has occurred in cfx5solve:<br/> <br/> Error reported by IO module: iif_set_lock: error reading lock file<br/> //litocmaster/work/benchmark.def.lck: No error<br/> <br/> An error has occurred in cfx5solve:<br/> <br/> Neither Start Command nor Option is defined for start method MSMPI; check<br/> that you have given the method name correctly.<br/> <br/> An error has occurred in cfx5solve:<br/> <br/> Neither Start Command nor Option is defined for start method MSMPI; check<br/> that you have given the method name correctly.<br/> <br/> Can't call method &quot;name&quot; on an undefined value at C:\Program Files\ANSYS Inc\v110\CFX\bin\/perllib/CFX5/Job/Settings.pm line 2464.<br/> An error has occurred in cfx5solve:<br/> <br/> Neither Start Command nor Option is defined for start method MSMPI; check<br/> that you have given the method name correctly.<br/> <br/> An error has occurred in cfx5solve:<br/> <br/> Neither Start Command nor Option is defined for start method MSMPI; check<br/> that you have given the method name correctly.<br/> <br/> Can't call method &quot;name&quot; on an undefined value at c:\Program Files\ANSYS Inc\v110\CFX\bin\/perllib/CFX5/Job/Settings.pm line 2464.<br/> An error has occurred in cfx5solve:<br/> <br/> Neither Start Command nor Option is defined for start method MSMPI; check<br/> that you have given the method name correctly.<br/> <br/> An error has occurred in cfx5solve:<br/> <br/> Neither Start Command nor Option is defined for start method MSMPI; check<br/> that you have given the method name correctly.<br/> <br/> Can't call method &quot;name&quot; on an undefined value at c:\Program Files\ANSYS Inc\v110\CFX\bin\/perllib/CFX5/Job/Settings.pm line 2464.<br/> An error has occurred in cfx5solve:<br/> <br/> Neither Start Command nor Option is defined for start method MSMPI; check<br/> that you have given the method name correctly.<br/> <br/> An error has occurred in cfx5solve:<br/> <br/> Neither Start Command nor Option is defined for start method MSMPI; check<br/> that you have given the method name correctly.<br/> <br/> Can't call method &quot;name&quot; on an undefined value at C:\Program Files\ANSYS Inc\v110\CFX\bin\/perllib/CFX5/Job/Settings.pm line 2464.</strong> &quot;<br/>  <br/> <br/> <br/> But when I submit the job with out <strong>mpiexec</strong> option, The job is running fine on available resources.<br/> <br/> Will <strong>mpiexec </strong> works with all applications or not. Please give me suggessions on this. And any body tested this kind of scenario with Starccm application. <br/> <br/> Regards,<br/> P. Kalyan Rao<br/>  © 2009 Microsoft Corporation. All rights reserved.Wed, 24 Jun 2009 18:54:29 Z18ffa63d-e517-44c3-bdf3-ea15f6910fa9http://social.microsoft.com/Forums/en-US/windowshpcmpi/thread/18ffa63d-e517-44c3-bdf3-ea15f6910fa9#18ffa63d-e517-44c3-bdf3-ea15f6910fa9http://social.microsoft.com/Forums/en-US/windowshpcmpi/thread/18ffa63d-e517-44c3-bdf3-ea15f6910fa9#18ffa63d-e517-44c3-bdf3-ea15f6910fa9pkalyanraohttp://social.microsoft.com/Profile/en-US/?user=pkalyanraohow to select cores from nodes to run job on multiple nodes. Hi all,<br/> <br/>     I have 2 nodes in my cluster with 4 core on each node.<br/> <br/>     I have one <strong>exe </strong> file called <strong>sleep.exe. </strong> I submitted job with  <strong>Job submit /numnodes:2 mpiexec -cores 2 sleep.exe</strong> then it<strong> </strong> was open 2 sleep.exe processes on each node. <br/> <br/> <br/>     And I have a 4 core Ansys CFX job, and I want to run this job on 2 core from first node and other 2 core from second node. <br/> <br/>     I have tried  with <strong>job submit /numnodes:2 /workdir:&lt;working directory path&gt; /stdout:out.log /stderr:error.log mpiexec -cores 2 cfx5solve.exe -v -def &lt;.def file&gt; -start-method MSMPI -part 4. </strong> Then the job got failed and generated below error information in error.log file<br/> <br/> <strong><br/> &quot;An error has occurred in cfx5solve:<br/> <br/> Error reported by IO module: readIntFmtData: (fgets failed) syserr:: No<br/> error<br/> <br/> An error has occurred in cfx5solve:<br/> <br/> Error reported by IO module: iif_set_lock: error reading lock file<br/> //litocmaster/work/benchmark.def.lck: No error<br/> <br/> An error has occurred in cfx5solve:<br/> <br/> Neither Start Command nor Option is defined for start method MSMPI; check<br/> that you have given the method name correctly.<br/> <br/> An error has occurred in cfx5solve:<br/> <br/> Neither Start Command nor Option is defined for start method MSMPI; check<br/> that you have given the method name correctly.<br/> <br/> Can't call method &quot;name&quot; on an undefined value at C:\Program Files\ANSYS Inc\v110\CFX\bin\/perllib/CFX5/Job/Settings.pm line 2464.<br/> An error has occurred in cfx5solve:<br/> <br/> Neither Start Command nor Option is defined for start method MSMPI; check<br/> that you have given the method name correctly.<br/> <br/> An error has occurred in cfx5solve:<br/> <br/> Neither Start Command nor Option is defined for start method MSMPI; check<br/> that you have given the method name correctly.<br/> <br/> Can't call method &quot;name&quot; on an undefined value at c:\Program Files\ANSYS Inc\v110\CFX\bin\/perllib/CFX5/Job/Settings.pm line 2464.<br/> An error has occurred in cfx5solve:<br/> <br/> Neither Start Command nor Option is defined for start method MSMPI; check<br/> that you have given the method name correctly.<br/> <br/> An error has occurred in cfx5solve:<br/> <br/> Neither Start Command nor Option is defined for start method MSMPI; check<br/> that you have given the method name correctly.<br/> <br/> Can't call method &quot;name&quot; on an undefined value at c:\Program Files\ANSYS Inc\v110\CFX\bin\/perllib/CFX5/Job/Settings.pm line 2464.<br/> An error has occurred in cfx5solve:<br/> <br/> Neither Start Command nor Option is defined for start method MSMPI; check<br/> that you have given the method name correctly.<br/> <br/> An error has occurred in cfx5solve:<br/> <br/> Neither Start Command nor Option is defined for start method MSMPI; check<br/> that you have given the method name correctly.<br/> <br/> Can't call method &quot;name&quot; on an undefined value at C:\Program Files\ANSYS Inc\v110\CFX\bin\/perllib/CFX5/Job/Settings.pm line 2464.</strong> &quot;<br/>  <br/> <br/> <br/> But when I submit the job with out <strong>mpiexec</strong> option, The job is running fine on available resources.<br/> <br/> Will <strong>mpiexec </strong> works with all applications or not. Please give me suggessions on this. And any body tested this kind of scenario with Starccm application. <br/> <br/> Regards,<br/> P. Kalyan Rao<br/>  Wed, 17 Jun 2009 07:29:54 Z2009-06-17T07:29:54Zhttp://social.microsoft.com/Forums/en-US/windowshpcmpi/thread/18ffa63d-e517-44c3-bdf3-ea15f6910fa9#7b44a14a-8d1b-47bf-96a9-266657d72d9dhttp://social.microsoft.com/Forums/en-US/windowshpcmpi/thread/18ffa63d-e517-44c3-bdf3-ea15f6910fa9#7b44a14a-8d1b-47bf-96a9-266657d72d9dLiohttp://social.microsoft.com/Profile/en-US/?user=Liohow to select cores from nodes to run job on multiple nodes. Hi,<br/><br/>Is this the complete error output? it seems that the application, cfx5solve.exe, is bailing out even before mpi_init. Please contact ANSYS support. some applications want to run with a specific core configuration and they check it internally. it might be the case here.<br/><br/>thanks,<br/>.Erez<br/><br/>P.S. does it run correclty when removing the &quot;-cores 2&quot; switch and replacing /numnodes:2 with /numcores:8 ?  (2 nodes using all cores)Thu, 18 Jun 2009 16:26:30 Z2009-06-18T16:26:30Zhttp://social.microsoft.com/Forums/en-US/windowshpcmpi/thread/18ffa63d-e517-44c3-bdf3-ea15f6910fa9#621c4d8d-834c-4e92-af66-4620691b92bchttp://social.microsoft.com/Forums/en-US/windowshpcmpi/thread/18ffa63d-e517-44c3-bdf3-ea15f6910fa9#621c4d8d-834c-4e92-af66-4620691b92bcpkalyanraohttp://social.microsoft.com/Profile/en-US/?user=pkalyanraohow to select cores from nodes to run job on multiple nodes. Hi Lio,<br/> <br/>    When I use MPIEXEC option before cfx5solve command, the job is getting finished. Job is running fine without mpiexec option.<br/> <br/> Thanks,<br/> P.Kalyan RaoFri, 19 Jun 2009 10:28:53 Z2009-06-19T10:28:53Zhttp://social.microsoft.com/Forums/en-US/windowshpcmpi/thread/18ffa63d-e517-44c3-bdf3-ea15f6910fa9#ce33b5ea-aa83-45b4-b8b1-3f39f8c1b392http://social.microsoft.com/Forums/en-US/windowshpcmpi/thread/18ffa63d-e517-44c3-bdf3-ea15f6910fa9#ce33b5ea-aa83-45b4-b8b1-3f39f8c1b392James Renhttp://social.microsoft.com/Profile/en-US/?user=James%20Renhow to select cores from nodes to run job on multiple nodes. Hi Kalyan, could you check the  option -start-method? From the help of cfx5solve -help, you can find the following:<br/><br/>-start-method &lt;name&gt;<br/>    Use the named start method to start the solver.  This option<br/>    allows you to use different parallel methods, as listed in the<br/>    Solver Manager GUI or in the etc/start-methods.ccl file, instead<br/>    of the defaults.  For parallel start methods, you must also provide<br/>    the -part or -par-dist arguments.<br/><br/> Also, this option should be quoted. I've tried -start-method <strong>&quot;MPICH2 Local Parallel for Windows&quot;</strong> and it looks working for me.<br/><br/>Thanks,<br/>JamesFri, 19 Jun 2009 23:03:02 Z2009-06-19T23:03:02Zhttp://social.microsoft.com/Forums/en-US/windowshpcmpi/thread/18ffa63d-e517-44c3-bdf3-ea15f6910fa9#522904a8-847e-4b15-ab55-ffdcf366fff5http://social.microsoft.com/Forums/en-US/windowshpcmpi/thread/18ffa63d-e517-44c3-bdf3-ea15f6910fa9#522904a8-847e-4b15-ab55-ffdcf366fff5pkalyanraohttp://social.microsoft.com/Profile/en-US/?user=pkalyanraohow to select cores from nodes to run job on multiple nodes. hi James,<br/> <br/>    I am using -star-method MSMPI option. in this situaltion how to use.<br/> <br/> thanks,<br/> P. kalyan rao<br/>Sat, 20 Jun 2009 04:15:03 Z2009-06-20T04:15:03Zhttp://social.microsoft.com/Forums/en-US/windowshpcmpi/thread/18ffa63d-e517-44c3-bdf3-ea15f6910fa9#288c6dbb-1f61-4484-8130-16fbe5087bf6http://social.microsoft.com/Forums/en-US/windowshpcmpi/thread/18ffa63d-e517-44c3-bdf3-ea15f6910fa9#288c6dbb-1f61-4484-8130-16fbe5087bf6James Renhttp://social.microsoft.com/Profile/en-US/?user=James%20Renhow to select cores from nodes to run job on multiple nodes. <p>Hi Kalyan, <br/><br/>you can't use MSMPI as the name of the -start-method. First find the file start-methods.ccl in your computer. You will find a list of START METHOD options and the corresponding usage in the file. Find the one for Windows. For example, if you want to run your job in parallel, you need use: -start-method &quot;MPICH2 Distributed Parallel for Windows&quot;<br/><br/>Thanks,<br/>James<br/></p>Sat, 20 Jun 2009 07:21:34 Z2009-06-20T07:21:34Zhttp://social.microsoft.com/Forums/en-US/windowshpcmpi/thread/18ffa63d-e517-44c3-bdf3-ea15f6910fa9#8723ecf6-759c-438b-a7d8-e6babe675380http://social.microsoft.com/Forums/en-US/windowshpcmpi/thread/18ffa63d-e517-44c3-bdf3-ea15f6910fa9#8723ecf6-759c-438b-a7d8-e6babe675380pkalyanraohttp://social.microsoft.com/Profile/en-US/?user=pkalyanraohow to select cores from nodes to run job on multiple nodes. Hi James,<br/> <br/>    Ansys CFX 11 with SP1 supports Windows HPC. So we can use MSMPI option at -start-method. My customers are running CFX on Windows HPC Cluster with MSMPI option only. The command we used in task list is : <strong>cfx5solve -v -def &lt;Input file name&gt; -start-method MSMPI -part &lt;number of Processors&gt;</strong> .<br/> So the job will run on available cores. <br/> <br/>   But there is no option in job submission wizard to select specific cores from specific node. It is possible with <strong>mpiexec - cores</strong> . So I tried by adding this options before <strong>cfx5solve </strong> command option. And I got message which was posted first.<br/> <br/> Thanks and Regards,<br/> P.Kalyan RaoSun, 21 Jun 2009 06:40:01 Z2009-06-21T06:40:01Zhttp://social.microsoft.com/Forums/en-US/windowshpcmpi/thread/18ffa63d-e517-44c3-bdf3-ea15f6910fa9#b9b7c7d4-7b95-4c0b-a462-00a3fa3b4989http://social.microsoft.com/Forums/en-US/windowshpcmpi/thread/18ffa63d-e517-44c3-bdf3-ea15f6910fa9#b9b7c7d4-7b95-4c0b-a462-00a3fa3b4989Liohttp://social.microsoft.com/Profile/en-US/?user=Liohow to select cores from nodes to run job on multiple nodes. Hi Kalyan,<br/><br/>correct, the scheduler has no options for process placements; the available options are /numnodes /numcores and /numsockets (the last one will allocate a process per socket, you can use it with the mpiexec -affinity option).<br/><br/>another option for you is to manipulate the CCS_NODES env var before calling mpiexec. that is, submit a script that changes CCP_NODES and then calls mpiexec.<br/><br/>I assume that cfx5solve sees inconsistency between the mpi world size and CCP_NODES and bails out. I cant see any other reason why the app behaves different with and without this switch.<br/><br/>thanks,<br/>.ErezWed, 24 Jun 2009 18:54:29 Z2009-06-24T18:54:29Z