יום שישי 15 אפריל 2011 20:27
I am running a local mpi job like this:
mpiexec.exe -n 2 process.exe
When I start my Juniper VPN client, the mpi job still runs fine, but if I close my VPN connection the mpi job no longer works. The mpiexec.exe still spawns an instance of smpd.exe, but the two processes are no longer able to talk to each other. It seems to get upset about the changing around of network adapters and IP addresses with the start and stop of the VPN client (but only the Juniper client - I have never seen this happen with the Cisco AnyConnect VPN client that I used to use).
Anyone ever seen a similar problem with interaction between ms-mpi and a VPN client? Any insight into how mpiexec.exe and smpd.exe interact with the tcp/ip stack and network adapters on a system, or whether there are any command-line switches that might be used as a workaround?
יום רביעי 20 אפריל 2011 14:33In case this ever helps anyone else, here is a summary of what I found. Let’s say I have an ip address of 10.1.1.101, and I run this command:
mpiexec –n 2 cmd /C echo hello
In this case, mpiexec launches smpd, which listens on a random port, and mpiexec connects to smpd using ip address 10.1.1.101. This works fine, and the result is:
Then I connect to the vpn using the Juniper client, and it gives me ip address 10.1.2.101, and I run the same command. Then mpiexec launches smpd, which listens on a random port, and mpiexec connects to smpd using ip address 10.1.2.101 – this still works, and gives the two “hellos”.
But then when I disconnect from the vpn, it has made some change to the tcp/ip stack, or network adapter settings, or something. I now have my original ip address again, 10.1.1.101, but when I run the original command, mpiexec launches smpd, it listens on a random port, and mpiexec tries to connect to that random port on ip address 10.1.1.101, but it can’t reach it anymore. It ends up erroring out with this error message:
ConnectFailed(986): unable to connect to 10.1.1.101 on port 45678, exhausted all endpoints
So I have found one way to get it working: if I start smpd manually, then use this command, it works:
mpiexec –n 2 –host 127.0.0.1 cmd /C echo hello
So, the difference is that the ip address 10.1.1.101 no longer works, but 127.0.0.1 does work. Somehow the endpoint 10.1.1.101:45678 cannot be reached anymore.