Hello,
I am using HPC 2012.
I am having troubles requeueing a job containing node preparation tasks.
Here is the exact scenario.
1. First time the job runs the node preparation task fails on ALL nodes (I made it fail on all nodes deliberately for the purpose of the test)
2. I requeue the job using the C# client using the following methods
IScheduler.ConfigureJob(jobId);
IScheduler.SubmitJobById(jobId,...);
3. The requeue fails (before having a chance to run the failed node prep again) witht he following exception
Microsoft.Hpc.Scheduler.Properties.SchedulerException: This job requires at least 1 cores, but the list of candidate nodes that the Job Scheduler service returned for this job contains only 0 cores. The Job Scheduler service determines the candidate node
list using the following job properties: NodeGroup, RequestedNodes, MinMemoryPerNode, MaxMemoryPerNode, MinCoresPerNode, MaxCoresPerNode, and ExcludedNodes. Either reduce the number of resources that the job requires, or redefine the relevant job properties,
and then submit the job again.
4. When I check the status of the node prep task in HPC cluster manager I see that it is cancelled with the reason :
This sub-task was canceled because it could not be requeued along with the rest of the job. Another sub-task will be created to replace it.
My understanding was that requeueing a job would requeue all failed tasks in that job, so why is the node prep task not run again ?
Thanks