How to access a shared folder on client machine (or headnode) from Azure worker nodes in an HPC grid? RRS feed

  • Question

  • I have an app that uses an HPC grid to perform long-running calculations. The grid nodes work by reading the data from a shared folder on the client machine, then writing back a file to this directory for the client to process. This works fine with purely HPC nodes, but I'm trying to get it to work with Azure worker nodes on the grid as well.

    I can add Azure worker nodes using a node template, and a SOA service loading test passes, but at runtime I have no access to read or write from either the client machine or the HPC head node. I'm wondering the best way to proceed from here? Is it possible to setup a connection with read/write access between the worker nodes and the client machine? Or even if I could get file access to the HPC headnode that could work.

    Perhaps using Azure blob storage as an intermediary between the client and the worker nodes would work? All suggestions welcomed, thanks.



    Friday, August 26, 2011 9:43 AM

All replies

  • HI Fraser,

    If you can expose your client/head node to the internet, i.e. letting the Azure nodes to access a folder share on it, you will have a better chance of setting read/write access between worker nodes and client machine. Note it is not yet possible to directly access a folder share on the worker nodes.

    Your thoughts of using Azure blob storage could work, you will need to create an Azure storage account, have the worker nodes write to the Azure storage, then get the data back on your client from the Azure storage.

    For how to access Azure storage blob, please refer to http://msdn.microsoft.com/en-us/library/ee691964.aspx




    Friday, August 26, 2011 6:34 PM
  • While file share over some VPN to Azure will work, I think Azure blob storage is a better solution in terms of performance and reliability. HPC client pack comes with hpcpack/hpcsync utility, which you can use to uploading/downloading data. A typical scenario will be,

    • Create package on your client (hpcpack create yourfile.zip files...)
    • Upload the package to Azure (hpcpack upload yourfile.zip /scheduler:headnode /nodetemplate:templatename /relativepath:...)
    • Add a "node preparation task" to your SOA job (hpcsync)
    • Your service running on Azure node can now read/write file locally
    • Add a "node release task" to pack the result file locally and upload it to Azure ("hpcpack create" & "hpcpack upload")
    • Get the package to your client with any Azure storage tool.

    NOTE: hpcpack is not available on Azure nodes, you need to deploy it manually.

    NOTE: depending on the file size and data access pattern, you might just send the content of the file in the message.

    Monday, August 29, 2011 5:57 AM
  • Hi Michael,

    Thanks for the answer. Unfortunately, it is not really an option to expose the client or headnode to the internet, due to where this application will be running. I don't need to access shared folders on the workers, rather the workers need to access a shared folder on ideally the client machine, or failing that the headnode. I will look into Azure blob storage; see my reply to yidingz below for more info, thanks.

    Monday, August 29, 2011 9:54 AM
  • Hi yidingz, thanks for the information.

    The application in question is a Monte Carlo simulation, involving thousands of trials, and the initial file can range in size from very small to several megs of XML. I am a little concerned about introducing a second stage of data transfer at the end of each trial however. Currently I am writing the results back directly to a shared folder on the client machine, whereas I would now have to pack up the results, upload them to a blob, then download and uncompress on the client. I'll give it a go as a proof of concept, but I'm apprehensive about the speed.

    Monday, August 29, 2011 10:14 AM
  • I agree that wrapping things up in several steps complicate things. Is it possible that you include the data in the request? 1MB should be fine, try to include that data as a part of the request. Also, are those data common for all requests? If so, you can upload the data before session starts.
    Monday, August 29, 2011 1:54 PM