27 februarie 2012 21:47
i have built a small hpc cluster
1 head node
2 compute nodes
topology 5 were all nodes connected directly to network is choosen
in cluster manager all nodes are reported as online and healthly
i can remote into all nodes and can ping all nodes from each other
when i try to run any diagnostics they all fail to run /fail to start the error given
internal exception happened when deal with run : the network name cannot be found
steps taken to build
installed windows server 2008 hpc edition on all
hpc pack installed on head node as headnode
hpc pack installed on each compute node as compute nodes
everything seemed to install correctly
28 februarie 2012 00:52
Could you provide any logs from your diagnostics? Which diagnostics are you running?
Can you do "clusrun dir" from an elevated cmd window on your headnode?
28 februarie 2012 00:58
It's an old error that is not-so-easy-to-find-reason.
Can you ping your nodes by their FQDN? Try ping nodes by their FQDN from head node and ping headnode by its FQDN from nodes.
28 februarie 2012 13:22
there are no logs form diagnostics because they never start they just say failed to run start time is blank they never run and there is no results in lower panes
28 februarie 2012 13:39
i can ping all nodes from the headnode using fqdn.. the only thing i can think of is there are two network cards in headnode blade and it creates a virtual adapter as well .i have configured the network setting in hpc manager to use each one of them and the result is the same failed to run the tests never start
- Editat de dentek 28 februarie 2012 13:40
29 februarie 2012 17:10
I had a problem with out cluster using two network interfaces (eth & infiniband) at the same time. Tests are failed and software worked unpredictably. Sadly, i had to disable one of the interfaces and all errors passed away.
Maybe you can try to disable on of the interfaces?
So, when you installed your cluster, which topology did you choose?
1 martie 2012 21:23
i have disble all adapter except one and tests still fail with same error i will try to remove the 2008 hpc pack and reinstall it to see it changes the result
i choose topology 5 all nodes directly connected to enterprise netowrk ( or all nodes connected to the network each server/blade only has one active netowrk card at present
thanks for help