none
Simplest way to pass messages between two machines over Infiniband RRS feed

  • Question

  • Hi,

    I am trying to port a fault-tolerant application that uses message passing over TCP sockets to Infiniband in order to improve latency (bandwidth is basically irrelevant). MPI looked like a good choice because I expected it to be plug and play but I'm having difficulty getting the software stack to work and am now wondering if it is overkill for our needs because our application is the special case of one connection between two nodes than will only ever run a single job.

    I have a simple test program that works perfectly when run as two processes on the same machine (172.16.3.54) from the command line with once the MSMPI's smpd has been started:

    mpiexec.exe -hosts 2 172.16.3.54 1 172.16.3.54 1 Infiniband.exe

    but if I try to run the same thing on two machines I get:

    mpiexec.exe -hosts 2 172.16.3.54 1 1.1.1.2 1 Infiniband.exe

    Aborting: Access denied by node '1.1.1.2'.
    A common cause: this node is a resource managed by the Compute Cluster scheduler and mpiexec was attempting to use it without a scheduled job.

    Now, I obviously don't want access checking, cluster resource management or job scheduling. Is it possible to override this in order to run a program simply using mpiexec or must I deal with the job scheduler? If it is not possible, where do I get the job scheduler because "job" is an unknown command? Do you think I should even be using MPI/CCP/HPC for this or do you think I should just write against the Mellanox drivers and sidestep MPI entirely?

    I have also tried Intel's MPI 4 library but I cannot get it to run my test program under any conditions.

    Although this may seem like an obscure use case, I believe these kinds of requirements will be relatively common in the finance sector because this is the best way to achieve fault tolerance with minimal cost in terms of latency (which is the benchmark for these systems).

    Many thanks,
    Jon Harrop.

     

    Friday, February 25, 2011 5:08 PM

All replies

  • Hello Jon,

    You can work around your problem by disabling job scheduler service on the compute node you wish to run MPI job with mpiexec command. The job scheduler service can be stopped with command: net stop HpcScheduler. After you finish your MPI job, you can restart it with net start HpcScheduler.

    Thanks,

    James

    Monday, February 28, 2011 5:17 PM
  • Hi James,

    Running "net stop HpcScheduler" gives the error "The service name is invalid". Looking at the services available here, none begin with Hpc.

    I only had Compute Cluster Pack installed so I just installed HPC Cluster Manager, HPC Job Manager and HPC Powershell from the HPC Pack 2008 R2 as well but I still don't see a service with that name and still get the same error from that command.

    How do I get this HpcScheduler service that I need to stop and why would mpiexec not work if I don't even have a scheduler installed?

    Many thanks,
    Jon Harrop.

     

    Tuesday, March 1, 2011 3:23 PM
  • Hi Jon,

    You may try the following:

    1. Disable the MSMPI service by doing:

        net stop msmpi

    2. Making sure to start the SMPD daemon on all compute nodes:

    Please let me know whether this works.

    -James

      

    Thursday, March 10, 2011 9:34 PM