locked
Typical MPI ping-pong latency over RoCE with NetworkDirect RRS feed

  • Question

  • Hello,

    I have setup a RoCE connection between two workstations using 2 Mellanox ConnectX4 FDR cards (no switch) using the latest WinOF-2 drivers. Latency tests included in the driver package show connection speeds up to the nominal 56GPS and latencies at about ~2us. When firing a mpipingpong test between the nodes, I get latencies at least 10x greater:

    mpiexec.exe /hosts 2 192.168.2.1 1 192.168.2.2 1 /env MPICH_NETMASK 192.168.2.0/255.255.255.0 /env MSMPI_ND_ZCOPY_THRESHOLD -1 /env MSMPI_DISABLE_ND 0 /env MSMPI_ND_ENABLE_FALLBACK 0 /env MSMPI_PRECONNECT all /affinity /priority 1 mpipingpong -r -op -pc


    Testing packet size 4 (1024 iterations per link)
    

    Ping Node Pong Node Latency Throughput (usec) (MB/sec) ----------------------------------------------------------------------- DESKTOP-C61LVC8 DESKTOP-MG0GUD5 28.301 0.135 DESKTOP-MG0GUD5 DESKTOP-C61LVC8 27.181 0.140 Testing packet size 8 (1024 iterations per link) Ping Node Pong Node Latency Throughput (usec) (MB/sec) ----------------------------------------------------------------------- DESKTOP-C61LVC8 DESKTOP-MG0GUD5 38.703 0.197 DESKTOP-MG0GUD5 DESKTOP-C61LVC8 38.614 0.198 Testing packet size 16 (1024 iterations per link) Ping Node Pong Node Latency Throughput (usec) (MB/sec) ----------------------------------------------------------------------- DESKTOP-C61LVC8 DESKTOP-MG0GUD5 38.159 0.400 DESKTOP-MG0GUD5 DESKTOP-C61LVC8 38.934 0.392 Testing packet size 32 (1024 iterations per link) Ping Node Pong Node Latency Throughput (usec) (MB/sec) ----------------------------------------------------------------------- DESKTOP-C61LVC8 DESKTOP-MG0GUD5 38.602 0.791 DESKTOP-MG0GUD5 DESKTOP-C61LVC8 40.199 0.759 Testing packet size 64 (1024 iterations per link) Ping Node Pong Node Latency Throughput (usec) (MB/sec) ----------------------------------------------------------------------- DESKTOP-C61LVC8 DESKTOP-MG0GUD5 38.509 1.585 DESKTOP-MG0GUD5 DESKTOP-C61LVC8 39.120 1.560 Testing packet size 128 (1024 iterations per link) Ping Node Pong Node Latency Throughput (usec) (MB/sec) ----------------------------------------------------------------------- DESKTOP-C61LVC8 DESKTOP-MG0GUD5 39.016 3.129 DESKTOP-MG0GUD5 DESKTOP-C61LVC8 38.388 3.180

    What is the optimum MPI latency that I should expect with a NetworkDirect connection over RoCE if everything is configured properly?

    Best Regards,

    Costas

    Tuesday, April 17, 2018 7:47 AM

Answers

  • Hi Costas,

    You should expect full performance when going through Network Direct, so something is definitely off.  From your latencies I would guess you are running over TCP/IP.  You may need to enable the Network Direct provider explicitly, if the driver install didn't do so.  You can force MS-MPI to use Network Direct or fail by adding `-env MSMPI_DISALBLE_SOCK 1` to your mpiexec command line to confirm.

    I believe the Mellanox drivers have a utility named `ndinstall.exe`, pass it the -i flag to install I believe, and the -l (lower case L) to list installed providers, though I haven't done this myself in quite a while so things may have changed.  Note that the inbox drivers included with Windows do not include the user-mode Network Direct driver components.

    Cheers,
    -Fab

    • Marked as answer by cyamin Wednesday, April 18, 2018 5:48 AM
    Tuesday, April 17, 2018 4:26 PM

All replies

  • Hi Costas,

    You should expect full performance when going through Network Direct, so something is definitely off.  From your latencies I would guess you are running over TCP/IP.  You may need to enable the Network Direct provider explicitly, if the driver install didn't do so.  You can force MS-MPI to use Network Direct or fail by adding `-env MSMPI_DISALBLE_SOCK 1` to your mpiexec command line to confirm.

    I believe the Mellanox drivers have a utility named `ndinstall.exe`, pass it the -i flag to install I believe, and the -l (lower case L) to list installed providers, though I haven't done this myself in quite a while so things may have changed.  Note that the inbox drivers included with Windows do not include the user-mode Network Direct driver components.

    Cheers,
    -Fab

    • Marked as answer by cyamin Wednesday, April 18, 2018 5:48 AM
    Tuesday, April 17, 2018 4:26 PM
  • Hello Fab,

    Thank you for your insight. You are right about TCP/IP. Running with "-env MSMPI_DISALBLE_SOCK 1" failed:

    C:>mpiexec.exe /hosts 2 192.168.2.1 192.168.2.2 /env MSMPI_ND_ZCOPY_THRESHOLD -1 /env MSMPI_DISABLE_ND 0 /env MSMPI_DISABLE_SOCK 1 /affinity /priority 1 mpipingpong -r -op -pc
    
    [ DESKTOP-MG0GUD5#1 ] Fatal Error: MPI Failure
    [ DESKTOP-MG0GUD5#1 ] Error Details:
    [ DESKTOP-MG0GUD5#1 ]   Other MPI error, error stack:
    [ DESKTOP-MG0GUD5#1 ]   MPI_Allgather(sbuf=0x0000000000E7F940, scount=128, MPI_CHAR, rbuf=0x0000000003561EC0, rcount=128, MPI_CHAR, MPI_COMM_WORLD) failed
    [ DESKTOP-MG0GUD5#1 ]   unable to connect to 192.168.2.1 192.168.0.15 DESKTOP-C61LVC8  on port 0, the socket interconnect is disabled
    
    
    [ DESKTOP-C61LVC8#0 ] Fatal Error: MPI Failure
    [ DESKTOP-C61LVC8#0 ] Error Details:
    [ DESKTOP-C61LVC8#0 ]   Other MPI error, error stack:
    [ DESKTOP-C61LVC8#0 ]   MPI_Allgather(sbuf=0x0000000000D8FD50, scount=128, MPI_CHAR, rbuf=0x0000000000DE1B00, rcount=128, MPI_CHAR, MPI_COMM_WORLD) failed
    [ DESKTOP-C61LVC8#0 ]   unable to connect to 192.168.2.2 192.168.0.24 DESKTOP-MG0GUD5  on port 0, the socket interconnect is disabled
    
    
    job aborted:
    [ranks] message
    
    [0] application aborted
    aborting MPI_COMM_WORLD (comm=0x44000000), error 250, comm rank 0
    
    [1] application aborted
    aborting MPI_COMM_WORLD (comm=0x44000000), error 250, comm rank 1
    
    ---- error analysis -----
    
    [0] on 192.168.2.1
    mpipingpong aborted the job. abort code 250
    
    [1] on 192.168.2.2
    mpipingpong aborted the job. abort code 250
    
    ---- error analysis -----

    ndinstall runs without problem:

    C:\Windows\system32>ndinstall -i
    
    Installing mlx5nd provider: already installed
    
    Installing mlx5nd2 provider: already installed
    
    Current providers:
            0000001001 - Hyper-V RAW
            0000001006 - MSAFD Tcpip [TCP/IP]
            0000001007 - MSAFD Tcpip [UDP/IP]
            0000001008 - MSAFD Tcpip [RAW/IP]
            0000001009 - MSAFD Tcpip [TCP/IPv6]
            0000001010 - MSAFD Tcpip [UDP/IPv6]
            0000001011 - MSAFD Tcpip [RAW/IPv6]
            0000001002 - RSVP TCPv6 Service Provider
            0000001003 - RSVP TCP Service Provider
            0000001004 - RSVP UDPv6 Service Provider
            0000001005 - RSVP UDP Service Provider
            0000001012 - MSAFD Irda [IrDA]
            0000001013 - NDv1 Provider for Mellanox WinOF-2
            0000001014 - NDv2 Provider for Mellanox WinOF-2

    I don't get the part about the user-mode Network Direct driver components of the inbox drivers. How does that affect my case and is there anything I can do about it?

    I am using the latest mellanox drivers (WinOF-2 v1.9), latest firmware and latest MS-MPI v9.0.1. I would appreciate any other suggestion regarding my setup.

    Regards,

    Costas


    Wednesday, April 18, 2018 5:48 AM