locked
types of events OnJobState and OnTaskState generate RRS feed

  • Question

  • OnTaskState doesn't seem to generate an event with state TaskState::Finished.  The documentation http://msdn.microsoft.com/en-us/library/microsoft.hpc.scheduler.ischedulerjob.ontaskstate(v=VS.85).aspx leads me to believe that it would, but doesn't definitely indicate yes or no.

    Similarly, OnJobState doesn't generate an event with JobState::Finished either.

    Has anyone else noticed this behavior?  I think correct behavior should be an event generated for every transition, as the documentation suggests.  That begs the question: is it a bug?

    I have pasted the following code (you'll need your own Sleep.exe or equivalent) that reproduces this behavior.

    static void Main(string[] args)
    {
    	IScheduler scheduler = new Scheduler();
    	ISchedulerJob spJob = null;
    
    	try
    	{
    		scheduler.Connect("my_hpc_head");
    
    		spJob = scheduler.CreateJob();
    		for( int i = 0; i < 100; i++ ){
    			ISchedulerTask spTask = spJob.CreateTask();
    			spTask.CommandLine = "Sleep.exe 5";
    			spJob.AddTask(spTask);
    		}
    
    		spJob.OnJobState += new EventHandler<JobStateEventArg>(MyJobState);
    		spJob.OnTaskState += new EventHandler<TaskStateEventArg>(MyTaskState);
    
    		scheduler.SubmitJob(spJob, null, null );
    	}
    	catch (Exception e)
    	{
    		Console.WriteLine(e.Message);
    	}
    	while (true)
    	{
    		spJob.Refresh();
    		if (spJob.State == JobState.Finished)
    			break;
    	}
    	Console.WriteLine("program ended");
    }
    public static void MyJobState(System.Object o, JobStateEventArg e)
    {
    	String str = " got Job transition from " + e.PreviousState.ToString() + " to " + e.NewState.ToString();
    	Console.WriteLine(str);
    }
    public static void MyTaskState(System.Object o, TaskStateEventArg e)
    {
    	String str = " got task transition from " + e.PreviousState.ToString() + " to " + e.NewState.ToString();
    	Console.WriteLine(str);
    }
    


    My output looks like:
     got Job transition from Configuring to Submitted
     got Job transition from Submitted to Validating
     got task transition from Submitted to Queued
        --- SNIP ---
     got task transition from Submitted to Queued
     got task transition from Submitted to Queued
       --- SNIP ---
     got task transition from Submitted to Queued
    program ended
    • Edited by Cheng11 Thursday, December 30, 2010 2:18 PM
    Tuesday, December 28, 2010 4:31 PM

All replies

  • Hi,

    I tried to compile your sample code, but I got the following error:

    The name 'spJob' does not exist in the current context 

    For line: spJob.Refresh();

    I just defined spJob after IScheduler scheduler = new Scheduler(); with line: ISchedulerJob spJob = null;

    After these modifications program compiled and my output looked like this (when number of tasks limited to 5):

     got Job transition from Configuring to Submitted
     got Job transition from Submitted to Validating
     got task transition from Submitted to Queued
     got task transition from Submitted to Queued
     got task transition from Submitted to Queued
     got task transition from Submitted to Queued
     got task transition from Submitted to Queued
     got Job transition from Validating to Queued
     got Job transition from Queued to Running
     got task transition from Queued to Dispatching
     got task transition from Queued to Dispatching
     got task transition from Queued to Dispatching
     got task transition from Queued to Dispatching
     got task transition from Queued to Dispatching
     got task transition from Dispatching to Running
     got task transition from Dispatching to Running
     got task transition from Dispatching to Running
     got task transition from Dispatching to Running
     got task transition from Dispatching to Running
     got task transition from Running to Finished
     got task transition from Running to Finished
     got task transition from Running to Finished
     got task transition from Running to Finished
     got task transition from Running to Finished
     got Job transition from Running to Finished
    program ended

    In your output I am seeing task transitions to Queued state, but there is no Dispatching or Running state, which is weird, because program terminates after job monitored in polling loop gets to Finished state. Is there a chance that your original code is producing multiple jobs (which would explain my compilation error) and by a mistake, the polling loop is not monitoring the job which is associated with your event handlers, but the one which starts and finishes earlier?

    If this is not the case, could you give me some more details about your environment? Like your cluster configuration, Windows HPC Server version etc.

    Thanks,
    Łukasz

    Thursday, December 30, 2010 4:29 AM
  • Lukasz,

    This smells like an heisenbug.  My configuration is Windows 2008 R2, hpc client/server version 3.0.2369.  Network configuration is single "enterprise" network. I am compiling on vs2008 on vista.

    I did windows update on my client and head node and rebooted both.  I can get JobState::Finished events now -- consider this matter solved.

     

    It's losing some events, but not in a consistent manner. Do you happen to know if the source event queue can become full and drop events?  As you can see from my output Task(1) goes from Dispatched to Finished without a Running state.

     

     

     

      got Job transition from Submitted to Validating

        got task(144.1) transition from Submitted to Queued

        got task(144.2) transition from Submitted to Queued

        got task(144.3) transition from Submitted to Queued

        got task(144.4) transition from Submitted to Queued

        got task(144.5) transition from Submitted to Queued

      got Job transition from Validating to Queued

    hey submitjob returned

      got Job transition from Queued to Running

        got task(144.1) transition from Queued to Dispatching

        got task(144.2) transition from Queued to Dispatching

        got task(144.2) transition from Dispatching to Running

        got task(144.2) transition from Running to Finished

        got task(144.3) transition from Queued to Dispatching

        got task(144.1) transition from Dispatching to Finished

        got task(144.3) transition from Dispatching to Running

        got task(144.4) transition from Queued to Dispatching

        got task(144.3) transition from Running to Finished

        got task(144.4) transition from Dispatching to Running

        got task(144.5) transition from Queued to Dispatching

        got task(144.4) transition from Running to Finished

        got task(144.5) transition from Dispatching to Running

        got task(144.5) transition from Running to Finished

      got Job transition from Running to Finished

    program ended

     

     

    Thursday, December 30, 2010 3:24 PM
  • Hi,

    Are you getting these 'Dispatching to Finished' messages with tasks as long as 'sleep 5'? I managed to reproduce this, but only with very short tasks with 'hostname' commandline.

    Anyway, I am opening a bug for this behavior, so thank you for reporting this issue. Is there a chance, that you could provide some more details about scenario where you want to apply job/task events and why lost states are becoming a problem?

    Thank you,
    Łukasz

    Thursday, January 6, 2011 7:10 PM
  • I was using "echo hello" when it transitioned from "Dispatching to Finished".

     

    I haven't paid too much attention to it; as long as the Finished event fires, it makes my progress tracking work.

     

    The missing state was a curiosity and a concern was that "Finished" events might also disappear.

    Monday, January 24, 2011 5:07 AM
  • I'm seeing similar behaviour where events for some states are never received. It only seems to happen when I submit thousands of jobs, but I never receive a "Running" or "Finished" for about 5/1000 jobs

    Was there ever a specific bug fix for this? If so which version? I am using HPC Pack 2008 R2 Server 3.3.3950.0 client 3.4.4169.0

    Thanks

    Tim

    Wednesday, July 3, 2013 9:09 AM
  • I was told by our Microsoft support contact that the eventing is not reliable via the API.

    In order to make it reliable I wrote a class that keeps track of jobs/tasks in progress. I still use the OnJobState and OnTaskState event, but I also poll HPC every minute and get the status of the jobs I think are in progress, and look for any that have changed state.

    Wednesday, September 4, 2013 9:38 AM