locked
Sending binary payload to the task RRS feed

  • Question

  • Dear all,

    We have serialized binary data which we want to send to the HPC job/task, how can we do it. HPC task only standard input and output properties. We reviewed UserBlob property of the task but couldn't understand how to use it. Below is what we are trying to do:

    1. We have a job with a single task.

    2. The single task executes one of our optimization engines named engine.exe.

    3. We need to pass binary data to this single task so that engine.exe can consume it.

    How to achieve this scenario programmatically with windows HPC.

    Thanks,

    Puneet


    Puneet Sharma


    • Edited by PuneetSharma035 Friday, February 24, 2017 3:05 AM removing some unnecessary data
    Friday, February 24, 2017 3:04 AM

Answers

  • The bottleneck will be at the SQL side as you have 100's of running jobs at the same time. You shall double check the system load on SQL Server and Headnode when pushing more workloads into the scheduler system.

    Qiufang Shi

    Thursday, March 2, 2017 12:42 AM

All replies

  • Hi Puneet,

      Questions for me to understand the scenario better:

    1. Where is the data before the job is created? And who is sending the data? The client that creates the job?

    2. Whether the data requires to send to the running task? How big is the data? Whether the data needs to be sent to the task multiple times during execution?

    Some thoughts:

    1. You can pass the data without scheduler involved. You can put the data in some share, in DB or in some cache system (Such as Redis). When create the job/task, passing the link/location of your data as task environment variable so that your engine.exe know where to read it

    2. If you data is not big, you can store the data as Environment variable;

    3. Or you can check our HPC SOA. Your client just need to create a session, the session job will spin up the engine.exe on all nodes. And then the client send the data through requests.


    Qiufang Shi

    Friday, February 24, 2017 7:57 AM
  • Dear Shi,

    Please find the answers to your questions below:

    1. When a job is being created by want to send this data to this job/task. This is kind of input data to this job which client doesn't want to store in their DB, so he is sending it during the job creation.

    2. We don't want to send the data to the running task. We want to send this input data during the job creation. The data might be few KBs. Data can be sent only once during the task creation.

    Can we have few KBs binary data as an environment variable?

    Thanks,

    Puneet


    Puneet Sharma



    • Edited by PuneetSharma035 Friday, February 24, 2017 1:42 PM corrupt data has been edited.
    Friday, February 24, 2017 12:14 PM
  • The user defined environment variable is around 32KB for one environment variable thus you can leverage this. If you have data more than this size, you might need to split them into multiple environment variables.

    And please choose appropriately for job environment variables and task environment variables (All tasks in the job will get a copy of the job environment variables)

    And you may check whether you need escape your data for special characters in environment variables.

     


    Qiufang Shi

    Monday, February 27, 2017 4:27 AM

  • Dear Qiufang,

    Thanks a lot for helping us out.

    In our scenario, we have one simulation engine which we want to run with windows HPC pack. So, we have this simulation engine executable which we provide as task's command line in HPC Job.

    Now this simulation engine needs some binary payload information which we want to send as an environment variable. I have set this environment variable while creating the task, but I don't know how my engine executable will access this environment variable?

    Can you let me know how my engine can consume this binary payload information.

    Thanks,

    Puneet



    • Edited by PuneetSharma035 Tuesday, February 28, 2017 11:24 PM formatting.
    Tuesday, February 28, 2017 11:20 PM
  • Hi Puneet,

      the environment variable can be accessed by your process directly just like to get other system environment variable such as %tmp%, %Computername%, etc. 

    For example, you can add an env variable in task: myenv=myvalue, and then add a task "echo %myenv%", when the task get run, the output will be "myvalue"


    Qiufang Shi

    Wednesday, March 1, 2017 1:09 AM
  • Thanks Qiufang. This works. Can we have 34KB environment variables for all the jobs we are executing in parallel modes? We have 100's of jobs running in parallel.

    Puneet Sharma

    Wednesday, March 1, 2017 2:27 AM
  • The bottleneck will be at the SQL side as you have 100's of running jobs at the same time. You shall double check the system load on SQL Server and Headnode when pushing more workloads into the scheduler system.

    Qiufang Shi

    Thursday, March 2, 2017 12:42 AM