16 марта 2009 г. 17:35
I have a working HPC SOA service that performs well in most of our scenarios.
However, we are getting problems when the "tasks" within the SOA session require large amounts of memory.
We are experiencing failures due to memory exhaustion. The session is setup to resource by core as each "task" is single-threaded. Some of the sessions have high memory requirements and HPC doesn't seem to take account of this when allocating work to compute nodes and there doesn't seem to be a way to inform HPC of the likely memory requirements prior to submitting the tasks to run.
(By large amounts of memory I mean somewhere between 2 and 4GB for each "task").
Does anyone have any ideas how we can get HPC to work for us for these cases?
24 марта 2009 г. 3:54Here's one option:
1. Create a node group called BigMemNodes which contains nodes with more than 4GB memory
2. when creating a session:
SessionStartInfo info = new SessionStartInfo(scheduler, serviceName);
info.NodeGroups = new StringCollection();
How to create a node group:
1. Launch Admin Console
2. Navigate to Node Management Pane
3. Select a node with at least 4GB RAM, right click and select "Groups" -> "New Group"
4. A UI pop up, name the node group "BigMemNodes"
5. Then multi-select multiple nodes with at least 4GB RAM, right select "Groups-> BigMemNodes", then these nodes are added to the group
- Предложено в качестве ответа Don PatteeModerator 5 февраля 2011 г. 0:57
24 марта 2009 г. 10:33Thanks for the response Ming.
I see now that I haven't explained the situation very well.
Let me try and explain it a little better:
Say all of our grid machines are quad-processors with 8GB of RAM (2GB per core).
For the vast majority of our workloads running 4 "tasks" per machine easily fits within the RAM available.
Occasionally we have a job (SOA session) where each "task" will need, say, 4GB of RAM each. In these cases we want two tasks to run per node. If we resource the job (session) by node then we have poor utilisation of our grid resources (1 task per node) and the job will take twice as long to run than the optimal distribution of tasks (2 tasks per node).
I hope that illustrates better the problem we are trying to address.