提出问题提出问题
 

已答复Headnode alternate between unreachable and then ready

  • 2008年6月4日 15:25Soon Heng 用户奖牌用户奖牌用户奖牌用户奖牌用户奖牌
     

    Hi,

     

    I MS RDP into the Headnode and the compute maangement MMC to manage the compute clusters of two compute nodes + one headnode. However, after operating for a while, the headnode will become unreachable and my MMC will disconnect. Then, after a while, about 5 minutes, the headnode become reachable again. Can anyone advise what when wrong?

     

    I have three networks.

    Public network

    MPI network

    Private network

     

    1. What should be the network binding order? Currently, mine is public, MPI and private. Could this be the caused of the problem?

     

    2. All my three nodes (1 HN and 2 CN) have 8 cores, total: 24 cores. When I submit a job and choose 16 cores, everything run less than 1 minutes. When it goes beyond 16 cores, it seem to run forever? Anyway to tell where the compute cluster is hung at?

     

答案

  • 2009年5月22日 20:52Don PatteeMSFT, 版主用户奖牌用户奖牌用户奖牌用户奖牌用户奖牌
     已答复
    Hopefully your issue has been resolved since it was posted so long ago. If you are still encountering the problem please start a new thread on the forum. We weren't that great at managing our forum in the past, I apologize for that, but we've made serious improvements in handling it and will get to all the new posts now.

全部回复

  • 2009年5月22日 20:52Don PatteeMSFT, 版主用户奖牌用户奖牌用户奖牌用户奖牌用户奖牌
     已答复
    Hopefully your issue has been resolved since it was posted so long ago. If you are still encountering the problem please start a new thread on the forum. We weren't that great at managing our forum in the past, I apologize for that, but we've made serious improvements in handling it and will get to all the new posts now.