Hi Qiufang,
I have listed the answers to your questions below
* How big is your data in both cases?
Our data is quite massive. Each compute node requires up to a 50Gb of data, that is specific to it. As well as this, there is up to another 50Gb of data to be shared
between all compute nodes.
* And how many requests will have large data?
In our case all requests will contain (either directly or indirectly) the large amount data described above.
* And how big is the cluster?
The sort of clusters we are using are composed of a small set of powerful nodes. So from 4 to 30 nodes, where each node has 256Gb to 2Tb of Ram.
At the moment we are transferring this data by serialising it to a file on a shared network drive. The file path is then passed to the compute nodes in the request
to allow the files to be read by the compute nodes.
This process is done for shared and non-shared data. The shared data is written once and read many times, whereas the non-shared data are written to separate files
and read once.
We are looking at ways to increase the speed of data transfer.
Thanks,
Chris