locked
Cancelling job within Activation Filter RRS feed

  • Question

  • Hello everyone,

    I've implemented a simple activation filter for my cluster that cancels jobs under certain conditions. What I've found is that when I call scheduler.CancelJob, the activation filter hangs until it is eventually terminated my the activation filters timeout and at that time the job is actually cancelled. The cancel is successful, but until the timeout passes my job queue is delayed and also I cannot execute any code after the cancel call because the activation filter is killed.

    This does not appear to be correct behavior. Other calls to the scheduler API (e.g., CreateJob, SubmitJob) work fine. Is there a setting I'm missing?

    Best,

    Scott
    Thursday, January 21, 2010 2:57 PM

Answers

  • No additional information was received on this issue. Assuming it has been resolved & closing.
    • Marked as answer by Don Pattee Friday, February 4, 2011 10:24 PM
    Friday, February 4, 2011 10:24 PM

All replies

  • Scott,

    I am seeing the same problem. Please help.

    Matt
    Thursday, January 21, 2010 7:34 PM
  • Would it be possible to see your activation filter code? 

    There is also an activation filter sample : http://technet.microsoft.com/en-us/library/dd346641(WS.10).aspx that cancels a job.  The sample worked fine with V2 RTM.
    Friday, January 22, 2010 2:54 PM
  • Hi Steve,

    Examples always help. :)  Here is a simple example of that I am doing/seeing:

            private static void Main(string[] args)
            {
                if (args.Length != 1)
                {
                    _log.ErrorFormat("Activation filter expects 1 argument but found {0}.", args.Length);
                    Environment.Exit(0);
                }

                var jobFile = args[0];

                if (!File.Exists(jobFile))
                {
                    _log.Error("The specified job file could not be found.");
                    Environment.Exit(0);
                }

                var headnode = ConfigurationManager.AppSettings.Get("headnode");
                var hpcScheduler = new Scheduler();
                hpcScheduler.Connect(headnode);

                var jobId = GetJobId(jobFile);

                _log.InfoFormat("Canceling job {0}.", jobId);

                hpcScheduler.CancelJob(jobId, string.Empty);

                _log.Info("This will not be called.");
            }


    _log is just a simple log4net logger.

    In the Cluster Manager the job stays in the "Active" pane with the error message "Canceled by user: Message:None" until the timeout for the activation filter expires.

    The output I see in the log file (after the job has disappeared from the Active pane) is:

    INFO - Logger loaded.
    INFO - Canceling job 11.

    Note that the line "_log.Info("This will not be called.");" is never called.

    This is occurring in both 2008 and 2008 R2 Beta.
    Friday, January 22, 2010 3:50 PM
  • The problem may be that your activation filter is "private static void Main(string[] args)"
    It should be "public static int Main(string[] args)"

    And after calling the CancelJob you should return 0.
    Returning a 0 will execute thje job. Returning a 1 will 'block the queue" in order to reserve the resources until a license is available.

    The hanging behavior may be because the scheduler was waiting for a return value, and your void main doesn't return anything. So eventually the activation fillter will time out.

    Here is an example that cancelled the job just fine ( I was testing on  HPC 2008 R2 )

    using

     

     

     

     

     

    System;
    using System.Collections.Generic;
    using System.Text;
    using System.Xml;
    using Microsoft.Hpc.Scheduler;
    namespace CancelAllActivationFilter
    {
    class Program
    {
    /// <summary>
    /// This is a activation filter to be used only for testing purposes.
    /// It will cancel all jobs unless there is an error of some sort in the activation filter.
    /// If there is an error the job will block the queue
    /// </summary>
    /// <param name="args">
    /// job xml file
    /// </param>
    /// <returns>
    /// 0 to execute the job ( job has been canceled )
    /// 1 to block the queue ( there was an error of some sort detected in the filter)
    /// </returns>
    public static int Main(string[] args)
    {
    if (args.Length != 1) {
    return 1; // block
    }

     

     

    int jobID = 0;
    try {
    XmlDocument inputJob = new XmlDocument();
    inputJob.Load(args[0]);
    // The base XML node in the document.
    XmlNode job = inputJob.DocumentElement;
    // Create the namespace that is used for the job XML schema.
    XmlNamespaceManager nsmgr = new XmlNamespaceManager(inputJob.NameTable);
    nsmgr.AddNamespace(
    "ab", @http://schemas.microsoft.com/HPCS2008/scheduler/);
    // Get the job ID in case the job needs to be canceled.
    XmlNode jobidnode = job.SelectSingleNode(@"@Id", nsmgr);
    if (jobidnode != null) {
    string JobIdStr = jobidnode.InnerXml;
    Int32.TryParse(JobIdStr, out jobID);
    }
    else {
    return 1; // block
    }
    }
    catch {
    return 1; // block
    }

    try
    {
    using (IScheduler scheduler = new Scheduler()) {
    scheduler.Connect(
    "localhost");
    String message = "No error in activation filter, job canceled";
    scheduler.CancelJob(jobID, message);
    return 0; // execute job,
    }

    }

     

    catch {
    return 1; // block
    }
    } // end Main

    }

    }
    Friday, January 22, 2010 6:12 PM
  • Changing Main to return an int and returning 0 does not cause the cancel function to stop blocking on my process. In the code for the first activation filter that I wrote (which applies the appropriate business logic) I called Enviroment.Exit(0), which has the same effect, and I was encountering the same problem.

    Out of curiosity, if you add the appropriate code to cause some side effect after the call to CancelJob (e.g., creating a file in a well-known location), does that code execute?

    Again, the issue I am seeing is that the CancelJob call is succeeding in its cancel operation, but blocking until the process is terminated.
    Friday, January 22, 2010 7:40 PM
  • Also, thank you for helping me with this - I don't mean to come off as ungrateful. :)
    Friday, January 22, 2010 7:41 PM
  • No additional information was received on this issue. Assuming it has been resolved & closing.
    • Marked as answer by Don Pattee Friday, February 4, 2011 10:24 PM
    Friday, February 4, 2011 10:24 PM