I am thinking of using the Scheduler in the Cloud, but I suppose this question applies equally to creating jobs and task and bursting them to Azure or using SIC.
If I have a cluster of 10 Azure nodes running, waiting for work and I create a job with one task. Let's say there is an executable on each node called RunMeNow.exe. Default, out of the box, if I create a job with 1 task to run RunMeNow.exe, will it run the executable on all nodes simultaneously?
I understand that there are options to:
1. Set the number of cores. If I set it to 2 cores and ran the task (assuming my node was using a small vm), then the task should run on one vm only.
2. Min and Max resources - If I set max to 8 cores would the task run on 4 of the VMs?
I'm trying to figure out, when I setup a job with a task, how I could make the task run on multi-vms simultaneously.
The scenario you describe here is no different than if you had the same configuration on an on-site cluster. This application you mentioned "RunMeNow" is a regular application. so if you create a job with one task to run this app and request 10 Azure nodes, it will run on ONE of the Azure nodes. If you wanted to run the application on 10 Azure nodes, then you would need to create a job with 10 tasks referencing the application.
Questions to 1 & 2:
1- Yes, the task will run only on one VM
2- If you set to 8 cores max and run this job, it will still run on ONE of the VMs. Your job will be allocated 8 Azure nodes, but the task will only run on one of them.
You can try this by using hostname as an example, I made a small movie, hope it helps:
- Proposed as answer by scorpiotek Wednesday, February 22, 2012 12:58 AM
Ok, so the two replies here appear to conflict. Michael says that if I set the number of nodes to 4, it will run on 4 of the VMs. You are saying that regardless of how many cores I set, it still runs on one VM. I am thinking here of a case where a small VM would have 2 cores, which would mean it would run on 4 VMs.
So I believe, that in order to get a single task to run on all VMs, you would have to have a parametric sweep application and give it a range of data to pull from. In that case, it should use the available 4 VMs right?
After thinking about my original question more, I suppose it would not make sense (unless I was using parameteric sweep), to tell an application to run and then expect it to run on all the nodes because you would normally expect all the nodes to have different data on them anyway.
So I believe, that in order to get a single task to run on all VMs, you would have to have a parametric sweep application and give it a range of data to pull from. In that case, it should use the available 4 VMs right
That is exactly right...when you have a Paremetric Sweep Job, you are creating via one task, many tasks with different input that will run in different resources (nodes, cores, sockets...whatever you request at the job level).
As far as your second remark goes, not having the data that they need *could* be an issue. With SP2, you now have "Node Preparation Tasks" that will run on the node where the application is scheduled to run. This type of task can set up whatever it is before your actual tasks start running.
Hope I did not confuse you more.