locked
HPC Headnode disconnects when submitting jobs with many tasks RRS feed

  • Question

  • I am trying to submit jobs with > 10,000 tasks from a C# application, and it almost always causes my application to disconnect from HPC, and the HPC Cluster Manager GUI hangs.

    The HPC headnode is running on Server 2012 with 16 cores and 512GB RAM. It is also running SQL Server Express for the HPC databases. It is not a computenode.

    I am using HPC Pack 2012 R2 4.2.4400.0 

    I add all tasks using ISchedulerJob.AddTasks(ISchedulerTask[]) then submit the job, so all tasks will start at the same time.

    Should HPC be able to cope with jobs of this size without disconnecting and locking up? Is there a better way to submit?

    public static void Main()
    {
    	var scheduler = new Scheduler();
    	scheduler.Connect("MyHeadNode");
    	scheduler.OnReconnect += (sender, msg) => { Console.WriteLine(msg.Code + " " + msg.Exception); };
    	var job = scheduler.CreateJob();
    	scheduler.AddJob(job);
    	const int taskCount = 10000;
    	var tasks = new List<ISchedulerTask>();
    	for (int i = 1; i <= taskCount; i++)
    	{
    		var task = job.CreateTask();
    		task.CommandLine = "echo " + i;
    		tasks.Add(task);
    	}
    	job.AddTasks(tasks.ToArray());
    	scheduler.SubmitJob(job, null, null);
            Thread.Sleep(30*1000);// wait and see if you get reconnected
    }

    Wednesday, September 16, 2015 4:52 PM

Answers

  • Hi,

        It is not efficient to have so many batch tasks in one job while you are using SQL express. You can try the parametric sweep task in a job with which the tasks are dynamically expanded during execution thus the load on the SQL will be small. Please check the sample here: https://msdn.microsoft.com/en-us/library/cc853429(v=vs.85).aspx

        If you do have needs on submitting jobs with more than 10K tasks that have different commandlines, We would suggest you look into SOA model which is far more efficient than batch job (Can reach thousands of requests per second easily). And we are happy to help if you have problem using SOA.




    Thursday, September 17, 2015 5:17 AM

All replies

  • Hi,

        It is not efficient to have so many batch tasks in one job while you are using SQL express. You can try the parametric sweep task in a job with which the tasks are dynamically expanded during execution thus the load on the SQL will be small. Please check the sample here: https://msdn.microsoft.com/en-us/library/cc853429(v=vs.85).aspx

        If you do have needs on submitting jobs with more than 10K tasks that have different commandlines, We would suggest you look into SOA model which is far more efficient than batch job (Can reach thousands of requests per second easily). And we are happy to help if you have problem using SOA.




    Thursday, September 17, 2015 5:17 AM
  • Thanks for your help the parametric sweep does exactly what I want!
    Thursday, September 17, 2015 8:47 AM