none
Juniper VPN client breaks mpiexec.exe

    Question

  • I am running a local mpi job like this:

     

    mpiexec.exe -n 2 process.exe

     

    When I start my Juniper VPN client, the mpi job still runs fine, but if I close my VPN connection the mpi job no longer works.  The mpiexec.exe still spawns an instance of smpd.exe, but the two processes are no longer able to talk to each other.  It seems to get upset about the changing around of network adapters and IP addresses with the start and stop of the VPN client (but only the Juniper client - I have never seen this happen with the Cisco AnyConnect VPN client that I used to use).

     

    Anyone ever seen a similar problem with interaction between ms-mpi and a VPN client?  Any insight into how mpiexec.exe and smpd.exe interact with the tcp/ip stack and network adapters on a system, or whether there are any command-line switches that might be used as a workaround?

    Friday, April 15, 2011 8:27 PM

All replies

  • In case this ever helps anyone else, here is a summary of what I found.  Let’s say I have an ip address of 10.1.1.101, and I run this command:

    mpiexec –n 2 cmd /C echo hello

    In this case, mpiexec launches smpd, which listens on a random port, and mpiexec connects to smpd using ip address 10.1.1.101.  This works fine, and the result is:

    hello
    hello

    Then I connect to the vpn using the Juniper client, and it gives me ip address 10.1.2.101, and I run the same command.  Then mpiexec launches smpd, which listens on a random port, and mpiexec connects to smpd using ip address 10.1.2.101 – this still works, and gives the two “hellos”.

    But then when I disconnect from the vpn, it has made some change to the tcp/ip stack, or network adapter settings, or something.  I now have my original ip address again, 10.1.1.101, but when I run the original command, mpiexec launches smpd, it listens on a random port, and mpiexec tries to connect to that random port on ip address 10.1.1.101, but it can’t reach it anymore.  It ends up erroring out with this error message:

    ConnectFailed(986): unable to connect to 10.1.1.101 on port 45678, exhausted all endpoints

    So I have found one way to get it working: if I start smpd manually, then use this command, it works:

    mpiexec –n 2 –host 127.0.0.1 cmd /C echo hello

    So, the difference is that the ip address 10.1.1.101 no longer works, but 127.0.0.1 does work.  Somehow the endpoint 10.1.1.101:45678 cannot be reached anymore.
    Wednesday, April 20, 2011 2:33 PM