2012年2月27日 下午 09:47
i have built a small hpc cluster
1 head node
2 compute nodes
topology 5 were all nodes connected directly to network is choosen
in cluster manager all nodes are reported as online and healthly
i can remote into all nodes and can ping all nodes from each other
when i try to run any diagnostics they all fail to run /fail to start the error given
internal exception happened when deal with run : the network name cannot be found
steps taken to build
installed windows server 2008 hpc edition on all
hpc pack installed on head node as headnode
hpc pack installed on each compute node as compute nodes
everything seemed to install correctly
2012年2月28日 上午 12:52
Could you provide any logs from your diagnostics? Which diagnostics are you running?
Can you do "clusrun dir" from an elevated cmd window on your headnode?
2012年2月28日 上午 12:58
It's an old error that is not-so-easy-to-find-reason.
Can you ping your nodes by their FQDN? Try ping nodes by their FQDN from head node and ping headnode by its FQDN from nodes.
2012年2月28日 下午 01:22
there are no logs form diagnostics because they never start they just say failed to run start time is blank they never run and there is no results in lower panes
2012年2月28日 下午 01:39
i can ping all nodes from the headnode using fqdn.. the only thing i can think of is there are two network cards in headnode blade and it creates a virtual adapter as well .i have configured the network setting in hpc manager to use each one of them and the result is the same failed to run the tests never start
- 已編輯 dentek 2012年2月28日 下午 01:40
2012年2月29日 下午 05:10
I had a problem with out cluster using two network interfaces (eth & infiniband) at the same time. Tests are failed and software worked unpredictably. Sadly, i had to disable one of the interfaces and all errors passed away.
Maybe you can try to disable on of the interfaces?
So, when you installed your cluster, which topology did you choose?
2012年3月1日 下午 09:23
i have disble all adapter except one and tests still fail with same error i will try to remove the 2008 hpc pack and reinstall it to see it changes the result
i choose topology 5 all nodes directly connected to enterprise netowrk ( or all nodes connected to the network each server/blade only has one active netowrk card at present
thanks for help