locked
whats a good way to start several tasks that need to talk to each other RRS feed

  • Question

  • I am testing a Windows HPC of 4 nodes each with 12 cores. The 'application' I am testing requires

    a single 'Master' (will take 1 core, runs continuously, for days) and 47 slaves each of which also runs for days.

    each slave has 2 parts that must both be active at same time though they take turns at actually doing anything. They both have rather long initiation times so they need to stay running, however only 1 part uses CPU at a time, each part is idle ~50% of the time. i.e. part A runs for several minutes, then part B runs for a few minutes, then A for a few minutes, then B for a few minutes,...  The two parts of the slave communicate with each other and part A communicates with the master as well.

    How do I submit a job ( or jobs) that will start the slaves so that the 2 parts of each slave will be on the same node.

    What I am hoping to acheive is something like this

    • node 1: master and 11 slaves ( ie 11 of A.bat running and the corresponding 11 instances of B.bat running)
    • node 2: 12 slaves ( 12 instances of A.bat and corresponding 12 instances of B.bat)
    • node 3: 12 slaves ( 12 instances of A.bat and corresponding 12 instances of B.bat)
    • node 4: 12 slaves ( 12 instances of A.bat and corresponding 12 instances of B.bat)

    Obviously I am new to HPC job management, any help would be much apreciated, have tried a .bat that starts A.bat and B.bat but it just hangs.

    Thanks

     

    PS HPC 2008   not  R2

    Wednesday, November 24, 2010 7:09 AM

Answers

  •  

    I found out why the job was hanging on the START command in the batch script.

    There was a slight error in the path, I had   start B.bat   when it should have been  start ..\B.bat

    Working a lot better now, though I will keep in mind the clusrun approach as well.

    Thanks

    PS I put the   start B.bat   into A.bat just after initiation of local environment variables and before the .exe that A.bat was to start.

    Thursday, November 25, 2010 6:37 AM

All replies

  • Hi kbam,

    How did you try to start A.bat and B.bat from a single bat script? Did you use 'start' command and some blocking statement at the end of script?

    Another thing, which may be useful is clusrun command and corresponding IRemoteCommand API. Clusrun tasks can be always run by administrator, even if all the resources are already reserved for regular jobs. You can run 12 clusrun jobs running A.bat (each will run single instance of A.bat per node), 12 clusrun jobs running B.bat, 1 clusrun job (with '/nodes:node1' parameter) running master.

    More about clusrun: http://technet.microsoft.com/en-us/library/cc947685(WS.10).aspx

    More about IRemoteCommand API: http://msdn.microsoft.com/en-us/library/cc853432(v=VS.85).aspx

    Thanks,
    Łukasz

    Wednesday, November 24, 2010 3:06 PM
  •  

    I found out why the job was hanging on the START command in the batch script.

    There was a slight error in the path, I had   start B.bat   when it should have been  start ..\B.bat

    Working a lot better now, though I will keep in mind the clusrun approach as well.

    Thanks

    PS I put the   start B.bat   into A.bat just after initiation of local environment variables and before the .exe that A.bat was to start.

    Thursday, November 25, 2010 6:37 AM