none
HPC 2008 R2 - Job finishes at 0% with message "Broker shut down this service host when shrinking session's resource allocation" RRS feed

  • Question

  • Hi,

    We have a WCF service deployed to a HPC 2008 R2 grid. Intermittently we have observed that a job finishes at 0% without completing with message "Broker shut down this service host when shrinking session's resource allocation".

    Following code snippet is how our client makes a call to the service endpoint.

    public static Session CreateNonDurableSession(string serviceName, int numberOfTasks, string headNode, string jobTemplate, string jobServiceName)
            {
                Flogger.Debug(">CreateNonDurableSession");
                SessionStartInfo sessionInfo = new SessionStartInfo(headNode, serviceName);
                sessionInfo.SessionResourceUnitType = SessionUnitType.Core;
                sessionInfo.MaximumUnits = numberOfTasks;
                sessionInfo.ServiceJobName = serviceName;            
                sessionInfo.JobTemplate = jobTemplate;
    
                Session ndSession = null;
                try
                {
                    ndSession = Session.CreateSession(sessionInfo);
                }
                catch (System.Exception ex)
                {
                    Flogger.Error(ex.Message, ex);
                }
    
                Flogger.DebugFormat("Created session for service : {0}", serviceName);
                Flogger.DebugFormat("MaximumUnits= {0}", numberOfTasks.ToString());
    
                return ndSession;
            }

    SessionBase clientSession = CreateNonDurableSession(ConfigHelper.GetWorkerServiceName(), nMaximumUnits, jobConfig.m_HeadNode, jobConfig.m_HpcJobTemplate, serviceJobName);

    NetTcpBinding m_Binding = new NetTcpBinding(SecurityMode.Transport);
    m_Binding.ReceiveTimeout = new TimeSpan(1, 0, 0, 0);

    BrokerClient<IHPCWorkerService> _Client = new BrokerClient<IHPCWorkerService>(clientSession, m_Binding); ... _Client.SendRequest(workerRequest);

    What could the reason be for the job to finish at 0%?

    Thanks,

    Prashant


    Thursday, October 30, 2014 3:49 AM

All replies

  • Hi Prashant,

    To check the error, you may need to collect logs for HPC. For how to collect logs, please refer to

    http://technet.microsoft.com/en-us/library/jj680669.aspx#BKMK_V4

    Other things to check:

    1. Make sure you have enough resource in your cluster, and the node are online and in healthy state.

    2. Submit a simple job, for example, just specify the command line as "ping localhost", see whether it succeeds.

    3. Follow the HPC SOA sample code to see any difference between your code and the sample code.

    Thanks,

    Evan

    Friday, October 31, 2014 8:57 AM
  • Please also be noted that HPC 2008 R2 support is ended, it is highly recommended to upgrade to windows HPC Pack 2012 R2.
    Friday, October 31, 2014 8:59 AM