Transfer data to the nodes RRS feed

  • Question

  • Hi all,

    I'm wondering if there is a way transferring  the
    data to the nodes ?

    Best Regards
    Tuesday, December 15, 2009 9:49 AM


All replies

  • Hi Shay,

    There are a few methods for moving data to and from compute nodes. Can you please tell us a little more about what you would like to do, and where the data is moving "from" and "to"?


    • Marked as answer by Shay Segev Wednesday, December 16, 2009 1:24 PM
    • Unmarked as answer by Shay Segev Wednesday, December 16, 2009 1:24 PM
    Tuesday, December 15, 2009 7:45 PM
  • Hi Patrik,

    I wrote a little demo for explain what I want to do -

    I made input folder on shared folder with 100 files ,
    and also output folder for output zipped files -

    I want  that every node that will compress files from input folder into zipped file at output folder , will do it on its local storage , not in the shared folder .(every node that  execute compress task - will copy input files into its local disk , zip the files and copy zipped file into output folder , I think it called Node Preparation and a Node Release ).

                //Create Test Files on shared storage
                CreateTestFiles(@"\\\data_transfer\Test\Input", 100);
                    job = scheduler.CreateJob();
                    string outputFolder = @"\\\data_transfer\Test\Output\";
                    string inputFolder = @"\\\data_transfer\Test\Input";
                    string executable = @"..\..\testers\zip.exe";
                    task = job.CreateTask();
                    task.CommandLine = executable + " testFileNum*.txt " + outputFolder + "file*.zip";
                    task.IsParametric = true;
                    task.StartValue = 1;
                    task.EndValue = 5;
                    task.IncrementValue = 1;
                    task.WorkDirectory = inputFolder;
                    job.OnTaskState += TaskStateCallback;
                    job.OnJobState += JobStateCallback;
                    scheduler.SubmitJob(job, @"MSHPC\shai-s", null);
                catch (Exception e)
    Best Regards

    Wednesday, December 16, 2009 1:47 PM
  • Hi Shay,

    I think you can instead use a batch file (e.g. zip.cmd) as your CommandLine which will do exactly what you would like: (i) copies the input file from the shared folder to local storage on the CN, (ii) zips the file, and then (iii) copies the zipped file back to the shared folder. This will require you to first create input/output folders on each CN that will be running the tasks. Additionally, you can either place the batch file on the shared folder or copy it to each CNs. Would that work for you?

    The Node Preparation and Node Release tasks that you mentioned are not available in V2, but are part of a new feature planned for the next version (V3) and are in the V3 Beta1 we recently released. If you're using V3 Beta1, then yes, you could use Node Preparation and Node Release tasks to copy the input and output data. However, you'd need to be careful when designing the your parametric task. That is, there is no automatic guarantee that each of your parametric task instances will run on a different node, and Node Preparation and Node Release tasks are designed to run when a job grows onto a node and when a job shrinks from a node, respectively. So, if two consecutive parametric tasks run on the same node, only one instance of the Node Preparation task will run on that node (i.e. when the job starts running on that node).


    Wednesday, December 16, 2009 6:32 PM
  • Hi Patrick ,

    I also thought using a batch file at first , but i wanted to find better way of controlling via API-  adding every task operations to do before commandLine and operations to do after command line ,

    thanks ,


    Thursday, December 17, 2009 10:22 AM
  • Hi Shay,

    If you want to set it up via the API without a batch file, then you could try splitting the work up with one job per file (instead of using a single job with a single parametric-sweep task). In that case, each job has three tasks: Task1 to copy the input file; Task2 to zip the file; and Task3 to copy the output file. You will also need to ensure that the tasks run in the correct order. You can do this using dependencies. i.e. You can make Task2 dependent on Task1, and Task3 dependent on Task2. n.b. If you use the API to setup the task dependencies, these dependencies will not show up when the tasks are viewed in the GUI. You can view API-created dependencies using the command-line tools (e.g. task.exe and the Powershell cmdlets).


    Thursday, December 17, 2009 10:27 PM
  • Hi Patrick ,

    I have another question , let's suppose for a moment that clients are remote computers on the web , and again I want to send files from clients to compute nodes - what is the right effective way ?

    Best Regards ,

    Sunday, December 20, 2009 3:26 PM
  • Hi Shay,

    It depends on the design of your client. Liwei references one example solution (HPCBP) in another forum thread: http://social.microsoft.com/Forums/en-US/windowshpcdevs/thread/4cc082e4-4eb7-4c98-a3e8-38751d5329ee


    • Marked as answer by Shay Segev Monday, January 11, 2010 2:23 PM
    Tuesday, December 22, 2009 1:26 AM