locked
Scheduling Jobs Using SOA RRS feed

  • Question

  • I've been trying to submit jobs to HPC using an EchoService and EchoClient that I created while referencing this Microsoft SOA whitepaper.  I followed the instructions to the dot, and deployed the service on each node in my cluster.  However, when I try to run the Echoservice Client application, I encounter an exception that cannot be handled (I've copied the exact code below, with the exception in bold). 

    The HPC Cluster Manager is recognizing the jobs that I submit, but for some reason, they are filed in the "Failed" category.  Each job I submit has two tasks, the first one is a WCF Broker task, so I believe that it is the parent job, while the other task is assigned to individual nodes by its parent.  I assume this means that my EchoService is working, because the jobs are being registered by the Cluster Manager.  So I can only guess that it is either a problem with the client, or with some sort of permission or firewall that I have enabled on the cluster.

    Any suggestions would be very welcome! 

    Here's the EchoService Client code that I'm using (unhandled exception in bold):

    using System;
    using System.Collections.Generic;
    using System.Text;
    using System.ServiceModel;
    using System.Threading;
    using Microsoft.Hpc.Scheduler.Session;

    namespace EchoClient
    {
        class Program
        {
            static void Main(string[] args)
            {
                string scheduler = "localhost";
                string serviceName = "EchoService";

                if (args.Length > 0)
                {
                    scheduler = args[0];
                    if (args.Length > 1)
                    {
                        serviceName = args[1];
                    }
                }

                // Create a session object that specifies the head node
                // to which to connect
                //and the name of the WCF service to use.
                // This example uses the default start information for a
                // session.
                SessionStartInfo info = new SessionStartInfo(scheduler, serviceName);
                info.ResourceUnitType = Microsoft.Hpc.Scheduler.Properties.JobUnitType.Node;
                info.MinimumUnits = 1;
                info.MaximumUnits = 4;

                Console.WriteLine("Creating a session...");
                // Create the session by calling the factory method
                using (Session session = Session.CreateSession(info))
                {
                    Console.WriteLine("Session's Endpoint Reference:{0}", session.EndpointReference.ToString());

                    // Binds session to the client proxy using NetTcp
                    // binding (specify only NetTcp binding). The
                    // security mode must be Transport and you cannot
                    // enable reliable sessions.
                    EchoServiceClient client = new EchoServiceClient(new NetTcpBinding(SecurityMode.Transport, false), session.EndpointReference);

                    AsyncResultCount = 100;

                    for (int i = 0; i < 100; i++)
                    // EchoCallBack is defined in EchoClientProxy.cs.
                    {
                        // This call will not block,
                        // as results becomes available
                        // the EchoCallBack method will be invoked
                        client.BeginEcho("hello world", EchoCallback, new RequestState(client, i));
                    }
                    AsyncResultsDone.WaitOne();

                    client.Close();
                    Console.WriteLine("Please enter any key to continue...");
                    Console.ReadLine();
                }
            }

            static int AsyncResultCount = 0;
            static AutoResetEvent AsyncResultsDone = new AutoResetEvent(false);

            // Encapsulates the context of the function callback
            class RequestState
            {
                int input;
                EchoServiceClient client;

                public RequestState(EchoServiceClient client, int input)
                {
                    this.client = client;
                    this.input = input;
                }

                public int Input
                {
                    get { return input; }
                }

                public string GetResult(IAsyncResult result)
                {
                    return client.EndEcho(result);
                }
            }

            static void EchoCallback(IAsyncResult result)
            {
                RequestState state = result.AsyncState as RequestState;

                Console.WriteLine("Response({0}) = {1}", state.Input, state.GetResult(result));


                if (Interlocked.Decrement(ref AsyncResultCount) <= 0)
                {
                    AsyncResultsDone.Set();
                }
            }
        }
    }
    Thursday, September 11, 2008 7:40 PM

Answers

  • You client code seems fine. There are a bunch of things you can do here:

    1. Look at the exception thrown in Session.CreateSession(info), normally there are some useful information there. If you use Visual Studio, you should be able to catch the exception and see the message.

    2. You can always run the two diagnostic tests from your AdminConsole. One of the will told you what kind of service you have installed on your cluster (including your EchoService), the other will run a basic http/net.tcp service on the cluster.

    3. Since both your job went to "Failed", it's more likely there are something wrong on the cluster causing the broker/service failed. You can double click the task and check the error output for detail information.

    Note, there are some glossary I want to dis-ambiguous :

    Session - "Session" is the unit when you want to start a SOA calculation. Each Session.CreateSession() will create 1 session for you.

    Job - "Job" is scheduler's resource allocation unit. Each session will include 2 job, 1 running on broker node as a broker, the other runs on normal Compute Node as service.

    Task - "Task" is the actually running worker. The broker job will have 1 task - broker task. The service job, on the other hand, will have multiple task, each representing one working serving the incoming SOA requests.

    Let me know if you can locate the error after you go through all these 3 steps.

    Cheers.
    Yiding
    Wednesday, September 17, 2008 1:09 AM

All replies

  • I'm forwarding this on to some of our SOA gurus to take a look at; hopefully they can get you an answer soon.


    -Josh
    Tuesday, September 16, 2008 8:51 PM
    Moderator
  • You client code seems fine. There are a bunch of things you can do here:

    1. Look at the exception thrown in Session.CreateSession(info), normally there are some useful information there. If you use Visual Studio, you should be able to catch the exception and see the message.

    2. You can always run the two diagnostic tests from your AdminConsole. One of the will told you what kind of service you have installed on your cluster (including your EchoService), the other will run a basic http/net.tcp service on the cluster.

    3. Since both your job went to "Failed", it's more likely there are something wrong on the cluster causing the broker/service failed. You can double click the task and check the error output for detail information.

    Note, there are some glossary I want to dis-ambiguous :

    Session - "Session" is the unit when you want to start a SOA calculation. Each Session.CreateSession() will create 1 session for you.

    Job - "Job" is scheduler's resource allocation unit. Each session will include 2 job, 1 running on broker node as a broker, the other runs on normal Compute Node as service.

    Task - "Task" is the actually running worker. The broker job will have 1 task - broker task. The service job, on the other hand, will have multiple task, each representing one working serving the incoming SOA requests.

    Let me know if you can locate the error after you go through all these 3 steps.

    Cheers.
    Yiding
    Wednesday, September 17, 2008 1:09 AM
  • Thomas Barrett said:

    I've been trying to submit jobs to HPC using an EchoService and EchoClient that I created while referencing this Microsoft SOA whitepaper.  I followed the instructions to the dot, and deployed the service on each node in my cluster.  However, when I try to run the Echoservice Client application, I encounter an exception that cannot be handled (I've copied the exact code below, with the exception in bold).  
                // Create the session by calling the factory method
                using (Session session = Session.CreateSession(info))
                {
                    Console.WriteLine("Session's Endpoint Reference:{0}", session.EndpointReference.ToString());


    And what exception is being thrown?  Can you cut and paste the error message for me to look at?
    Wednesday, September 17, 2008 1:10 AM
    Answerer
  • Barndawgie, yidingz, and John,

    Thanks very much for your help with this. Thomas was my summer intern and is now back at University. I've figured out how to fix the problem, but need your help understanding why it was a problem in the first place.

    Here are the details:
    1. The Session.CreateSession(info) reports the following exception:
      Microsoft.Hpc.Scheduler.Session.SessionException occurred
        Message="An exception occurred when submitting the job, see inner exception"
        Source="Microsoft.Hpc.Scheduler.Session"
        StackTrace:
             at Microsoft.Hpc.Scheduler.Session.Session.EndCreateSession(IAsyncResult result)
             at EchoClient.Program.Main(String[] args) in C:\Program Files\Microsoft HPC Pack 2008 SDK\Samples\wcfbroker\HelloWorld\EchoClient\Program.cs:line 47
        InnerException: Microsoft.Hpc.Scheduler.Properties.SchedulerException
             Message="Task 8.1 failed. Please check the failed task for more details on the failure."
             Code=-2147218980
             Params="8.1"
             InnerException:

    2. I then went to HPC Cluster Manager and found two errors per VS exception -

      1. EchoSvc - WCF service    Failed    XXXXX\Administrator    Normal    10/5/2008 9:52:57 PM    1-4 Nodes    Child job has finished.   
      2. EchoSvc - WCF service - Broker for service job 7    Failed    XXXXX\Administrator    Normal    10/5/2008 9:52:57 PM    1-1 Cores    Task 8.1 failed. Please check the failed task for more details on the failure.   

    3. Digging into error 2, I see the error "C:\Program Files\Microsoft Compute Cluster Pack\bin\HpcWcfBroker.exe"' is not recognized as an internal or external command, operable program or batch file." Turns out it's looking for HpcWcfBroker.exe in the wrong location. Checking my evironment variables, %CCP_HOME% actually points to "C:\Program Files\Microsoft Compute Cluster Pack\"

    4. Re-mapping %CCP_HOME% to "C:\Program Files\Microsoft HPC Pack\" makes everything hunkey dorey

    5. So, I'm kind of surprised that the RTM HPC didn't work for SOA out of the box...is there a chance that my config is a result of having previously installed (and uninstalled) the earlier beta? (seems like HPC install should overwrite any required system variables). Should I do the same edits for CCP_INC, CCP_LIB32 and CCP_LIB64?

    FYI, here are the details for errors 1 and 2:

    Error #1

      <?xml version="1.0" encoding="utf-8" ?>
    - <Job Version="2.000" Id="7" Name="EchoSvc - WCF service" SubmitTime="10/6/2008 4:52:57 AM" CreateTime="10/6/2008 4:52:57 AM" StartTime="10/6/2008 4:52:57 AM" EndTime="10/6/2008 4:52:58 AM" ChangeTime="10/6/2008 4:52:57 AM" UnitType="Node" MinCores="1" MaxCores="4" MinSockets="1" MaxSockets="4" MinNodes="1" MaxNodes="4" RunUntilCanceled="false" IsExclusive="false" ErrorCode="-2147218986" ErrorParams="" State="Failed" PreviousState="Canceling" UserName="XXXXX\Administrator" JobType="Service" Priority="Normal" RequiredNodes="" IsBackfill="false" NextTaskNiceID="2" HasGrown="false" HasShrunk="false" OrderBy="" TaskLevelUpdateTime="10/6/2008 4:52:57 AM" MinMaxUpdateTime="10/6/2008 4:52:57 AM" ComputedMinNodes="1" ComputedMaxNodes="4" RequestCancel="None" RequeueCount="0" AutoRequeueCount="0" FailureReason="None" ServiceName="EchoSvc" PendingReason="None" AutoCalculateMax="false" AutoCalculateMin="false" ParentJobId="0" ChildJobId="8" NumberOfCalls="0" NumberOfOutstandingCalls="0" CallDuration="0" CallsPerSecond="0" FailOnTaskFailure="false" Preemptable="true" ProjectId="1" JobTemplateId="1" OwnerId="2" ClientSourceId="2" Project="" JobTemplate="Default" DefaultTaskGroupId="7" Owner="XXXXX\Administrator" ClientSource="EchoClient" xmlns="http://schemas.microsoft.com/HPCS2008/scheduler/">
      <Dependencies />
    - <Tasks>
      <Task Version="2.000" Id="16" SubmitTime="10/6/2008 4:52:57 AM" CreateTime="10/6/2008 4:52:57 AM" ChangeTime="10/6/2008 4:52:57 AM" ErrorCode="0" ErrorParams="" State="Finished" PreviousState="Finished" ParentJobId="7" RequestCancel="None" Closed="false" RequeueCount="0" AutoRequeueCount="0" FailureReason="None" PendingReason="None" InstanceId="-1" RecordId="7" MinCores="1" MaxCores="1" MinSockets="1" MaxSockets="1" MinNodes="1" MaxNodes="1" IsExclusive="false" NiceId="1" CommandLine=""%CCP_HOME%bin\HpcServiceHost.exe"" IsRerunnable="false" HasCustomProps="false" IsParametric="true" StartValue="1" EndValue="4" IncrementValue="1" GroupId="7" ParentJobState="Failed" UnitType="Node" ParametricRunningCount="0" ParametricCanceledCount="0" ParametricFailedCount="0" ParametricQueuedCount="0" />
      </Tasks>
      </Job>

    Error #2
      <?xml version="1.0" encoding="utf-8" ?>
    - <Job Version="2.000" Id="8" Name="EchoSvc - WCF service - Broker for service job 7" SubmitTime="10/6/2008 4:52:57 AM" CreateTime="10/6/2008 4:52:57 AM" StartTime="10/6/2008 4:52:57 AM" EndTime="10/6/2008 4:52:58 AM" ChangeTime="10/6/2008 4:52:57 AM" UnitType="Core" MinCores="1" MaxCores="1" MinSockets="1" MaxSockets="1" MinNodes="1" MaxNodes="1" RunUntilCanceled="false" IsExclusive="false" ErrorCode="-2147218980" ErrorParams="8.1" State="Failed" PreviousState="Running" UserName="XXXXX\Administrator" JobType="Broker" Priority="Normal" RequiredNodes="" IsBackfill="false" NextTaskNiceID="2" HasGrown="false" HasShrunk="false" OrderBy="" TaskLevelUpdateTime="10/6/2008 4:52:57 AM" MinMaxUpdateTime="10/6/2008 4:52:57 AM" ComputedMinCores="1" ComputedMaxCores="1" RequestCancel="None" RequeueCount="0" AutoRequeueCount="0" FailureReason="None" ServiceName="EchoSvc" PendingReason="None" AutoCalculateMax="false" AutoCalculateMin="false" ParentJobId="7" ChildJobId="0" NumberOfCalls="0" NumberOfOutstandingCalls="0" CallDuration="0" CallsPerSecond="0" FailOnTaskFailure="false" Preemptable="true" ProjectId="1" JobTemplateId="1" OwnerId="2" ClientSourceId="1" Project="" JobTemplate="Default" DefaultTaskGroupId="8" Owner="XXXXX\Administrator" ClientSource="unknown" xmlns="http://schemas.microsoft.com/HPCS2008/scheduler/">
      <Dependencies />
    - <Tasks>
    - <Task Version="2.000" Id="21" SubmitTime="10/6/2008 4:52:57 AM" CreateTime="10/6/2008 4:52:57 AM" StartTime="10/6/2008 4:52:57 AM" EndTime="10/6/2008 4:52:58 AM" ChangeTime="10/6/2008 4:52:57 AM" ErrorCode="-2147218979" ErrorParams="1" State="Failed" PreviousState="Dispatching" ParentJobId="8" ExitCode="1" RequestCancel="None" Closed="false" RequeueCount="0" AutoRequeueCount="0" FailureReason="None" PendingReason="None" InstanceId="0" Output="'"C:\Program Files\Microsoft Compute Cluster Pack\bin\HpcWcfBroker.exe"' is not recognized as an internal or external command, operable program or batch file." RecordId="8" MinCores="1" MaxCores="1" IsExclusive="false" NiceId="1" CommandLine=""%CCP_HOME%bin\HpcWcfBroker.exe"" HasCustomProps="false" IsParametric="false" GroupId="8" ParentJobState="Failed" UnitType="Core" ParametricRunningCount="0" ParametricCanceledCount="0" ParametricFailedCount="0" ParametricQueuedCount="0">
    - <EnvironmentVariables>
    - <Variable>
      <Name>HPCWCFBROKER_SHARESESSION</Name>
      <Value>False</Value>
      </Variable>
    - <Variable>
      <Name>HPCWCFBROKER_SECURE</Name>
      <Value>True</Value>
      </Variable>
    - <Variable>
      <Name>HPCWCFBROKER_TRANSPORTSCHEME</Name>
      <Value>NetTcp</Value>
      </Variable>
    - <Variable>
      <Name>CCP_PARENTJOBID</Name>
      <Value>7</Value>
      </Variable>
      </EnvironmentVariables>
      </Task>
      </Tasks>
      </Job>


    • Edited by David Cuccia Monday, October 6, 2008 5:42 AM formatting
    Monday, October 6, 2008 5:41 AM
  • Changed the WCF service to just run as local system account instead of specifying it in the service panel.
    Sunday, November 10, 2019 6:19 PM