13. října 2011 9:32
I have lots of this error message in the Event Viewer -> Microsoft - HPC - Scheduler - Operational : "RemotingCommunicator: An unexpected error occurred trying to start task RemotingCommunicator.1663 on node xxx".
I wrote xxx because I have found lots of this kind of error with a different number after "on node". Note that I have only two nodes in my cluster (one is head node and also compute node).
Please, can someone tell me why this error occurs ?
Thank you very much
14. října 2011 16:53
Could you share details of your cluster configuration? Here's what we are interested in:
- Number, type of nodes and their hardware specs,
- Networking topology and hardware,
- Location of the database and version of the SQL Server.
18. října 2011 18:46
This type of error messages is usually related to some sort of problems in communication between Scheduler and cluster nodes. These issues might be network related or could be caused by, for example, nodes being rebooted during job/task execution.
19. října 2011 13:46I will ask to a colleague more details to post here.
24. října 2011 14:46Hi,
We have a cluster of 3 servers (one is head/compute node, the other 2 are compute nodes).
Servers are HP BL685 with HP on board NICs. They are in different subnets 02,03 and 04, connected to 1 Gig switches, running 1 gig full duplex.
The database is local to the same server marked as head node. We use the SQL Express that comes with HPC to host the database.