none
all diagnostics fail server 2008 hpc r2 sp1

    Question

  • hello everyone.

    i have built a small hpc cluster

    1 head node

    2 compute nodes

    topology 5 were all nodes connected directly to network is choosen

    in cluster manager all nodes are reported as online and healthly

    i can remote into all nodes and can ping all nodes from each other

    when i try to run any diagnostics they all fail to run /fail to start the error given

    internal exception happened when deal with run : the network name cannot be found

    steps taken to build

    installed windows server 2008 hpc edition on all

    hpc pack installed on head node as headnode

    hpc pack installed on each compute node as compute nodes

    everything seemed to install correctly

    andy advice?

    Monday, February 27, 2012 9:47 PM

All replies

  • Hi Dentrk,

    Could you provide any logs from your diagnostics? Which diagnostics are you running?

    Can you do "clusrun dir" from an elevated cmd window on your headnode?

    Michael

    Tuesday, February 28, 2012 12:52 AM
  • It's an old error that is not-so-easy-to-find-reason.

    Can you ping your nodes by their FQDN? Try ping nodes by their FQDN from head node and ping headnode by its FQDN from nodes.

    Tuesday, February 28, 2012 12:58 AM
  • there are no logs form diagnostics because they never start they just say failed to run    start time is blank they never run and there is no results in lower panes

    Tuesday, February 28, 2012 1:22 PM
  • i can ping all nodes from the headnode using fqdn.. the only thing i can think of is there are two network cards in headnode blade and it creates a virtual adapter as well .i have configured the network setting in hpc manager to use each one of them and the result is the same failed to run the tests never start


    • Edited by dentek Tuesday, February 28, 2012 1:40 PM
    Tuesday, February 28, 2012 1:39 PM
  • Hello.

    I had a problem with out cluster using two network interfaces (eth & infiniband) at the same time. Tests are failed and software worked unpredictably. Sadly, i had to disable one of the interfaces and all errors passed away.

    Maybe you can try to disable on of the interfaces?

    So, when you installed your cluster, which topology did you choose?

    Wednesday, February 29, 2012 5:10 PM
  • i have disble all adapter except one and tests still fail with same error i will try to remove the 2008 hpc pack and reinstall it to see it changes the result

    i choose topology 5 all nodes directly connected to enterprise netowrk ( or all nodes connected to the network each server/blade only has one active netowrk card at present

    thanks for help

    Thursday, March 01, 2012 9:23 PM