none
HPC Pack 2016, Network direct RDMA not enabled RRS feed

  • Question

  • Hello,

    I have successfully installed a Cluster running with HPC Pack 2016 U2. The compute nodes are Windows 2012 R2 servers and the head node are a Windows 2016 server. Infiniband is used as application network. For this, every server has a Mellanox ConnectX-3 Infiniband Adapter with newest firmware and driver. At network device advanced settings, RDMA is enabled.

    The problem is, in the HPC Cluster Manager Resource Management, the head node is the only one which the "Network Direct" option is true. This means, for my understanding, that RDMA only works on the head node.

    Can anyone help me to find out what is configured wrong on the compute nodes, why network direct is shown as false?

    Are there some options to configure RDMA in MS HPC Pack 2016?

    Thanks

    Andi


    Friday, January 11, 2019 12:00 PM

All replies

  • Could you try to run the mpi pingpong on the compute nodes to check your current application network latency and throughput?

    And this doc https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-hpc-server-2008/dd391828(v=ws.10) tells you how to enable network direct through ndinstall.exe


    Qiufang Shi

    Monday, January 14, 2019 2:41 AM
  • I cant try mpi pinpong test, because it does not start the test. After clicking the run button, the test failed without results file. None of the tests want to start.

    By the way, only the head node is a domain registered server. The compute nodes are non domain servers. But MPI applications (like ANSYS) works perfect with the MS HPC Pack. Therefore, there should be no user rights problem.

    The ndinstall -l command shows my Mellanox device: 0000001011 - OpenFabfrics ..... and 0000001012 Open Fabrics ......

    Are there any other settings to check?

    Monday, January 14, 2019 10:06 AM