none
Poor performance with Infiniband RRS feed

  • Question

  • Hello,

    We are using a Windows HPC 2008 R2 Cluster for our research. The problem is that the Infiniband peformance is rather bad.

    We are using two different node configurations with Intel DS5400XS and Asus P6T7 SuperComputer motherboards and Xeon/i7 processors. Each node is equipped with one Mellanox Infinihost III Ex HCA (2 x 20 Gb) - only one NIC is connected with the IB Switch.

    Drivers are installed correctly, also NetworkDirect.

    I wrote a simple program, that is sending data from one node to another using simple MPI_Send/Receives. Before and after every transmission the current time is written into an array. When the transmissions have finished, the measurements are written into a file. I use this program to measure the actual bandwidth.

    The results show, that between nodes with Asus boards the bandwidth is approximately 10 - 12 Gb/s. 10 000 MB take 7-8 sec (100 x 100 MB). Is a node with Intel board involved, the bandwidth amounts to only 6-8 Gb/s.  The HCAs provide 20 Gb/s !!!

    Motherboard and HCA BIOS/firmware is up to date. I also tried different PCIe slots on the boards, other cables and different ports on the HCAs -> no effect on the bandwidth.

    Any ideas?

    Thx in advance,

    Wolfgang

    ----------------------------------------

    Output of vstat:

     

     hca_idx=0
     uplink={BUS=PCI_E, SPEED=2.5 Gbps, WIDTH=x8, CAPS=2.5*x8}
     vendor_id=0x02c9
     vendor_part_id=25218
     hw_ver=0x20
     fw_ver=5.03.0000
     PSID=MT_0370140002
     node_guid=0002:c902:0029:96b0
     num_phys_ports=2
       port=1
       port_state=PORT_DOWN (1)
       link_speed=NA
       link_width=NA
       rate=NA
       port_phys_state=POLLING (2)
       active_speed=2.5 Gbps (1)
       sm_lid=0x0000
       port_lid=0x0000
       port_lmc=0x0
       max_mtu=2048 (4)
    
       port=2
       port_state=PORT_ACTIVE (4)
       link_speed=5.0 Gbps (2)
       link_width=4x (2)
       rate=20 Gbps
       port_phys_state=LINK_UP (5)
       active_speed=5.0 Gbps (2)
       sm_lid=0x0005
       port_lid=0x0005
       port_lmc=0x0
       max_mtu=2048 (4)
    

     

    • Moved by parmita mehtaModerator Monday, October 25, 2010 10:35 PM (From:Windows HPC Server Deployment, Management, and Administration)
    Wednesday, October 20, 2010 10:35 AM

Answers

  • Hello,

    Have you installed the service provider with ndinstall -i?

    What's the output of ndinstall -l? It looks like you didn't enabled the IB.

    Thanks,

    James

    Tuesday, November 16, 2010 7:45 PM

All replies