none
Help with some fundamentals RRS feed

  • General discussion

  • Hello,

     

    I am currently working on a Win32 program that makes some intensive calculation, and is already written to be multithreaded. As a result, it uses all the available cores on the PC it runs on.

    The basic behavior is for the user to open a model, click the “start” button, then the threads are spawned, and once all is finished, control is given back to the user.

    While this works great, we have found that for larger models, the computation time is limited by the number of cores as the pool of tasks that could run in parallel is not empty.

    As a result, we are investigating the possibility to use grid computing to somehow multiply the number of available cores.

    This, of course, has technical challenges and reading documentation on various websites led me to the Windows HPC one and to this forum.

    I’m not sure it’s the appropriate place to ask my questions, but should it not be the case, please tell me what an appropriate place might be.

     

    I understand that HPC has some kind of a framework that would facilitate the communication between the user’s computer and the nodes that perform the distributed tasks.

    What I have a hard time grasping are these :

     

    What communication layer is used? How do I choose it?

     

    What is the behavior in case a node dies or becomes unreachable?

     

    What makes any given machine become a node available for tasks?

     

    Is there some sort of load balancing ?

     

    Is there a monitoring tool that would give me indications of the status and health of the nodes?

     

    How do I decide which part of my application code goes "distant" ? How does it get transferred to the nodes? If I understand things correctly, I would have to write a separate command line exe that takes care of the tasks and this would be the exe that gets sent over to node.

     

    I’m quite sure all these are trivial questions for those with more experience, but I’m having a hard time finding resources that would answer those.

     

    Thanks in advance for your help

    Olivier

    Thursday, January 20, 2011 3:10 PM

All replies

  • For your questions:

    What communication layer is used? How do I choose it?

     [yiding] HPC SOA use WCF as communication layer, which is Microsoft's implementation of WS-*. This means we do support Java/C++ interop, even on non-windows platform.

    What is the behavior in case a node dies or becomes unreachable?

     [yiding] In HPC SOA infrastructure, there is broker node to handle compute node or network failure. In case of failures, the calculation will be retried on other computers until (a configurable) retry limit is reached.

    What makes any given machine become a node available for tasks?

     [yiding] You'll need a HPC cluster to use HPC SOA infrastructure. Any node in the HPC cluster will be available for calculations. HPC cluster support desktop CPU scavenging, meaning that you can add desktop computers running Windows 7 to the cluster.

    Is there some sort of load balancing ?

    [yiding] The broker node uses a round-robbing algorithm to separate calculations to all nodes. HPC scheduler has advanced resource scheduling mechanism.

    Is there a monitoring tool that would give me indications of the status and health of the nodes?

    [yiding] HPC Cluster/Job manager provides a view on node health in the cluster, plus a session progress view on the calculation. 

    How do I decide which part of my application code goes "distant" ? How does it get transferred to the nodes? If I understand things correctly, I would have to write a separate command line exe that takes care of the tasks and this would be the exe that gets sent over to node.

    [yiding] The answer is “the calculation heavy part that can be paralleled”! You’ll need to deploy your code to the HPC cluster. You don’t need separate CLI though. If you have HPC SOA debugger installed, it’ll give you a good hello world sample and help you to do most of the deployment stuff.

    Sunday, January 30, 2011 5:55 AM
  •  

    [yiding] HPC SOA use WCF as communication layer, which is Microsoft's implementation of WS-*. This means we do support Java/C++ interop, even on non-windows platform.  

    How do I decide which part of my application code goes "distant" ? How does it get transferred to the nodes? If I understand things correctly, I would have to write a separate command line exe that takes care of the tasks and this would be the exe that gets sent over to node.

    [yiding] The answer is “the calculation heavy part that can be paralleled”! You’ll need to deploy your code to the HPC cluster. You don’t need separate CLI though. If you have HPC SOA debugger installed, it’ll give you a good hello world sample and help you to do most of the deployment stuff.

    Thank your for your answers.

    I see that it uses WCF for the communication. But what if I write in a language that is not supported by .Net but still can create command line tools?
    Basically, I was thinking of sending the exe, the input file and tell HPC to run the exe with that input file. The exe would then create an output file that I would retrieve.
    Is this feasible?

    With regards to SOA Debugger, how do I get access to it? I could not find any download links on the HPC homepage. Does it mean I need to first purchase HPC through some license agreement then be able to evaluate it?

    Thanks
    Olivier

    Monday, January 31, 2011 3:57 PM