locked
Excel workbook offloading job tasks cannot be dispatched to compute nodes RRS feed

  • Question

  • I have a HPC 2008 R2 cluster which have HPC services for Excel 2010 deployed on all nodes. When I tried to submit a job, I could see the SOA session created, on the broker node, there is such log as

    Session 4482 has been sucessfully created.

    And I can see on the compute nodes, there are below logs printed

    Servicehost is started.

    The service assembly is loaded.

    Aslo I can see the HpcServiceHost processes on the compute node.

    However the calculation requests just can't arrive the compute node, I didn't see any Excel instance launched on the compute node. I also found lots of error logs on the broker node saying:

    [Session:4482] [Dispatcher] .ReceiveRequest: EndpointNotFoundException happens in client Service Client (Faulted) 0fc7ad40-6ea0-4903-b6b7-07650415cbc3, net.tcp://private.dpc02:9101/4482/24762/_defaultEndpoint, message id = d195d055-dda4-4615-a4ba-19881fe198cf

    ("dpc02" is the name of the compute node).

    I ran all the diagnostics tests, all excel related services passed,

    The Firewall test warning is due to the firewall is disabled on the compute node.

    The Services Running Test failure was due to the WDS on the borker(head) node not running , other services were all running. And the Service Running Test on compute was passed.

    I think the job failure was because something blocked the compute from receiving calc request from the broker node but I still have no idea why the broker node cannot communicate with the compute node.

    There is the McAfee installed on the nodes, and I can see the firewall core service is running, but the IT division said nothing is blocked, all ports are opened.

    This realy makes m mad,  does anyone have any idea?

    Thursday, November 19, 2015 6:01 AM

Answers

  • Noticed that the service host url is net.tcp://private.dpc02:9101/4482/24762/_defaultEndpoint, so suppose the cluster has a private network besides the enterprise network.  In this network configuration, SOA traffic would go through private network by default. If the firewall has already opened all the ports, the private network connection should also be checked. Could you just ping private.dpc02 to see if the private network is connected between the broker node and the compute node?

    Besides, it is also feasible to switch the default network from private to enterprise, running 'cluscfg listenvs' woud show a cluster wide environment named WCF_NETWORKPREFIX, then run 'cluscfg setenvs WCF_NETWORKPREFIX=Enterprise' to switch to the enterprise network if the private network has any connectivity problem.

    BR,

    Yutong Sun

    • Marked as answer by MChen19th Monday, November 23, 2015 4:33 AM
    Friday, November 20, 2015 7:59 AM

All replies

  • I also tried telnet with port 9100~9163 (these are ports for borker service dispatching calc requests to HpcServiceHost), they are blocked!

    Maybe I should as IT division to check again, or provide me some logs about incoming accesses on compute nodes...

    Thursday, November 19, 2015 7:13 AM
  • That's probably the cause for the required ports are blocked by the firewalls on the compute nodes.  This could be reproed with a simple SOA client and service.

    For a complete list of the ports required for HPC services, this link is a good reference (for HPC Pack 2012 /R2 though).

    Btw, if you enable the compute node role on the head node/broker node (also with Excel 2010 installed), and bring offline all other compute nodes, then you may be able to work around the firewall issue to test the Excel service on this single machine for validation purpose.

    BR,

    Yutong Sun

    Thursday, November 19, 2015 1:46 PM
  • Thanks for the reply and yes bringing offline all other nodes did get the job done successfully but it's very slow and we want dozens of nodes get involved to do calculation..

    we've checked that all ports are opened (I made a mistake that the HpcServiceHosts on compute nodes were closed after a while , they were not running when I tried with telnet from the broker node...)  But there still was no request went to the HpcServiceHosts at all.

    Would that be something wrong with the broker worker? I can see HPC Broker(Out) listed in windows firewall and it's enabled, all addresses, protocols, ports are allowed.

    Friday, November 20, 2015 1:54 AM
  • Microsoft.Hpc.Excel.ExcelDriver: provides a wrapper around the Excel Primary Interop Assembly (PIA) interface that enables managed code to interact with a Microsoft Officeapplication's COM-based Microsoft.Hpc.Excel.ExcelDriver, which allows a user to open a workbook, launch an Excel process, and invoke a macro.
    Friday, November 20, 2015 6:07 AM
  • Noticed that the service host url is net.tcp://private.dpc02:9101/4482/24762/_defaultEndpoint, so suppose the cluster has a private network besides the enterprise network.  In this network configuration, SOA traffic would go through private network by default. If the firewall has already opened all the ports, the private network connection should also be checked. Could you just ping private.dpc02 to see if the private network is connected between the broker node and the compute node?

    Besides, it is also feasible to switch the default network from private to enterprise, running 'cluscfg listenvs' woud show a cluster wide environment named WCF_NETWORKPREFIX, then run 'cluscfg setenvs WCF_NETWORKPREFIX=Enterprise' to switch to the enterprise network if the private network has any connectivity problem.

    BR,

    Yutong Sun

    • Marked as answer by MChen19th Monday, November 23, 2015 4:33 AM
    Friday, November 20, 2015 7:59 AM
  • Yeah, didn't really realize that it has the "private" prefix, by switching it to Enterprise, the problem is resolved. Thanks a lot.
    Monday, November 23, 2015 4:34 AM