none
Passing large message in an SOA HPC service RRS feed

  • Question

  • Hello, 

    I am trying to write a service that requires a large amount of data to be passed to compute nodes from the client. There is both a large amount of common data that is shared between compute nodes, but as well as this a large amount of data for compute node specific tasks.

    I have seen that there is a common data construct for shared data. However, in the documentation at https://msdn.microsoft.com/en-us/library/cc907051(v=vs.85).aspx#object_reference, it is mentioned that references to data can be passed to reduce serialisation time. Would you be able to point me to where I might find more details on doing this?

    Thanks.

    Thursday, June 8, 2017 9:19 AM

All replies

  • Hi Chris,

      How big is your data in both cases? And how many requests will have large data? And how big is the cluster? And all these info matters. Nesting the data in the request to hundreds of nodes is costy while to several nodes is okay.


    Qiufang Shi

    Thursday, June 8, 2017 7:10 PM
  • Hi Qiufang,

    I have listed the answers to your questions below

     * How big is your data in both cases?

    Our data is quite massive. Each compute node requires up to a 50Gb of data, that is specific to it. As well as this, there is up to another 50Gb of data to be shared between all compute nodes.

     * And how many requests will have large data?

    In our case all requests will contain (either directly or indirectly) the large amount data described above.

     * And how big is the cluster?

    The sort of clusters we are using are composed of a small set of powerful nodes. So from 4 to 30 nodes, where each node has 256Gb to 2Tb of Ram.

    At the moment we are transferring this data by serialising it to a file on a shared network drive. The file path is then passed to the compute nodes in the request to allow the files to be read by the compute nodes.

    This process is done for shared and non-shared data. The shared data is written once and read many times, whereas the non-shared data are written to separate files and read once.

    We are looking at ways to increase the speed of data transfer.

     

    Thanks,

    Chris

    Friday, June 9, 2017 4:41 PM
  • Hi Chris,

    HPC SOA has a Common Data feature which can be used for service hosts to access a set of shared data, you may check this post and the sample code to get familiar with this feature. Note we usaully use Common Data to share data at MB level, with regard to GB size data the read time and memory usage could large. We have a public tool named ClusterCopy to do the tree copy for replicating common data among compute nodes, also some in-house tool for P2P copy. With regard to the non-shared data for specific node or service host, suppose it would be fine to use node preparation task to robocopy the private data to the compute node from one or multiple file shares.

    Regards,

    Yutong Sun

    Tuesday, June 13, 2017 9:01 AM
    Moderator