Answered by:
Typical MPI ping-pong latency over RoCE with NetworkDirect

Question
-
Hello,
I have setup a RoCE connection between two workstations using 2 Mellanox ConnectX4 FDR cards (no switch) using the latest WinOF-2 drivers. Latency tests included in the driver package show connection speeds up to the nominal 56GPS and latencies at about ~2us. When firing a mpipingpong test between the nodes, I get latencies at least 10x greater:
mpiexec.exe /hosts 2 192.168.2.1 1 192.168.2.2 1 /env MPICH_NETMASK 192.168.2.0/255.255.255.0 /env MSMPI_ND_ZCOPY_THRESHOLD -1 /env MSMPI_DISABLE_ND 0 /env MSMPI_ND_ENABLE_FALLBACK 0 /env MSMPI_PRECONNECT all /affinity /priority 1 mpipingpong -r -op -pc
Testing packet size 4 (1024 iterations per link)
What is the optimum MPI latency that I should expect with a NetworkDirect connection over RoCE if everything is configured properly?Ping Node Pong Node Latency Throughput (usec) (MB/sec) ----------------------------------------------------------------------- DESKTOP-C61LVC8 DESKTOP-MG0GUD5 28.301 0.135 DESKTOP-MG0GUD5 DESKTOP-C61LVC8 27.181 0.140 Testing packet size 8 (1024 iterations per link) Ping Node Pong Node Latency Throughput (usec) (MB/sec) ----------------------------------------------------------------------- DESKTOP-C61LVC8 DESKTOP-MG0GUD5 38.703 0.197 DESKTOP-MG0GUD5 DESKTOP-C61LVC8 38.614 0.198 Testing packet size 16 (1024 iterations per link) Ping Node Pong Node Latency Throughput (usec) (MB/sec) ----------------------------------------------------------------------- DESKTOP-C61LVC8 DESKTOP-MG0GUD5 38.159 0.400 DESKTOP-MG0GUD5 DESKTOP-C61LVC8 38.934 0.392 Testing packet size 32 (1024 iterations per link) Ping Node Pong Node Latency Throughput (usec) (MB/sec) ----------------------------------------------------------------------- DESKTOP-C61LVC8 DESKTOP-MG0GUD5 38.602 0.791 DESKTOP-MG0GUD5 DESKTOP-C61LVC8 40.199 0.759 Testing packet size 64 (1024 iterations per link) Ping Node Pong Node Latency Throughput (usec) (MB/sec) ----------------------------------------------------------------------- DESKTOP-C61LVC8 DESKTOP-MG0GUD5 38.509 1.585 DESKTOP-MG0GUD5 DESKTOP-C61LVC8 39.120 1.560 Testing packet size 128 (1024 iterations per link) Ping Node Pong Node Latency Throughput (usec) (MB/sec) ----------------------------------------------------------------------- DESKTOP-C61LVC8 DESKTOP-MG0GUD5 39.016 3.129 DESKTOP-MG0GUD5 DESKTOP-C61LVC8 38.388 3.180
Best Regards,
Costas
Tuesday, April 17, 2018 7:47 AM
Answers
-
Hi Costas,
You should expect full performance when going through Network Direct, so something is definitely off. From your latencies I would guess you are running over TCP/IP. You may need to enable the Network Direct provider explicitly, if the driver install didn't do so. You can force MS-MPI to use Network Direct or fail by adding `-env MSMPI_DISALBLE_SOCK 1` to your mpiexec command line to confirm.
I believe the Mellanox drivers have a utility named `ndinstall.exe`, pass it the -i flag to install I believe, and the -l (lower case L) to list installed providers, though I haven't done this myself in quite a while so things may have changed. Note that the inbox drivers included with Windows do not include the user-mode Network Direct driver components.
Cheers,
-Fab- Marked as answer by cyamin Wednesday, April 18, 2018 5:48 AM
Tuesday, April 17, 2018 4:26 PM
All replies
-
Hi Costas,
You should expect full performance when going through Network Direct, so something is definitely off. From your latencies I would guess you are running over TCP/IP. You may need to enable the Network Direct provider explicitly, if the driver install didn't do so. You can force MS-MPI to use Network Direct or fail by adding `-env MSMPI_DISALBLE_SOCK 1` to your mpiexec command line to confirm.
I believe the Mellanox drivers have a utility named `ndinstall.exe`, pass it the -i flag to install I believe, and the -l (lower case L) to list installed providers, though I haven't done this myself in quite a while so things may have changed. Note that the inbox drivers included with Windows do not include the user-mode Network Direct driver components.
Cheers,
-Fab- Marked as answer by cyamin Wednesday, April 18, 2018 5:48 AM
Tuesday, April 17, 2018 4:26 PM -
Hello Fab,
Thank you for your insight. You are right about TCP/IP. Running with "-env MSMPI_DISALBLE_SOCK 1" failed:
C:>mpiexec.exe /hosts 2 192.168.2.1 192.168.2.2 /env MSMPI_ND_ZCOPY_THRESHOLD -1 /env MSMPI_DISABLE_ND 0 /env MSMPI_DISABLE_SOCK 1 /affinity /priority 1 mpipingpong -r -op -pc [ DESKTOP-MG0GUD5#1 ] Fatal Error: MPI Failure [ DESKTOP-MG0GUD5#1 ] Error Details: [ DESKTOP-MG0GUD5#1 ] Other MPI error, error stack: [ DESKTOP-MG0GUD5#1 ] MPI_Allgather(sbuf=0x0000000000E7F940, scount=128, MPI_CHAR, rbuf=0x0000000003561EC0, rcount=128, MPI_CHAR, MPI_COMM_WORLD) failed [ DESKTOP-MG0GUD5#1 ] unable to connect to 192.168.2.1 192.168.0.15 DESKTOP-C61LVC8 on port 0, the socket interconnect is disabled [ DESKTOP-C61LVC8#0 ] Fatal Error: MPI Failure [ DESKTOP-C61LVC8#0 ] Error Details: [ DESKTOP-C61LVC8#0 ] Other MPI error, error stack: [ DESKTOP-C61LVC8#0 ] MPI_Allgather(sbuf=0x0000000000D8FD50, scount=128, MPI_CHAR, rbuf=0x0000000000DE1B00, rcount=128, MPI_CHAR, MPI_COMM_WORLD) failed [ DESKTOP-C61LVC8#0 ] unable to connect to 192.168.2.2 192.168.0.24 DESKTOP-MG0GUD5 on port 0, the socket interconnect is disabled job aborted: [ranks] message [0] application aborted aborting MPI_COMM_WORLD (comm=0x44000000), error 250, comm rank 0 [1] application aborted aborting MPI_COMM_WORLD (comm=0x44000000), error 250, comm rank 1 ---- error analysis ----- [0] on 192.168.2.1 mpipingpong aborted the job. abort code 250 [1] on 192.168.2.2 mpipingpong aborted the job. abort code 250 ---- error analysis -----
ndinstall runs without problem:
C:\Windows\system32>ndinstall -i Installing mlx5nd provider: already installed Installing mlx5nd2 provider: already installed Current providers: 0000001001 - Hyper-V RAW 0000001006 - MSAFD Tcpip [TCP/IP] 0000001007 - MSAFD Tcpip [UDP/IP] 0000001008 - MSAFD Tcpip [RAW/IP] 0000001009 - MSAFD Tcpip [TCP/IPv6] 0000001010 - MSAFD Tcpip [UDP/IPv6] 0000001011 - MSAFD Tcpip [RAW/IPv6] 0000001002 - RSVP TCPv6 Service Provider 0000001003 - RSVP TCP Service Provider 0000001004 - RSVP UDPv6 Service Provider 0000001005 - RSVP UDP Service Provider 0000001012 - MSAFD Irda [IrDA] 0000001013 - NDv1 Provider for Mellanox WinOF-2 0000001014 - NDv2 Provider for Mellanox WinOF-2
I don't get the part about the user-mode Network Direct driver components of the inbox drivers. How does that affect my case and is there anything I can do about it?
I am using the latest mellanox drivers (WinOF-2 v1.9), latest firmware and latest MS-MPI v9.0.1. I would appreciate any other suggestion regarding my setup.
Regards,
Costas
Wednesday, April 18, 2018 5:48 AM