Hi,
i am Getting Error while running MPI program using mpiexec.exe in Azure Batch multi-instance tasking - Batch MPI
below is the complete debug output. any help would be appreciated<g class="gr_ gr_303 gr-alert gr_gramm gr_inline_cards gr_run_anim Punctuation multiReplace" data-gr-id="303" id="303">..</g>
[00:1952] creating connect command to '100.104.58.44'
[00:1952] posting command SMPD_CONNECT to left child, src=0, dest=1.
[00:1952] host 100.104.42.32 is not connected yet
[00:1952] Handling cmd=SMPD_CONNECT result
[00:1952] cmd=SMPD_CONNECT result will be handled locally
[00:1952] successful connect to 100.104.42.32.
[00:1952] creating connect command for left node
[00:1952] creating connect command to '100.104.36.14'
[00:1952] posting command SMPD_CONNECT to left child, src=0, dest=2.
[00:1952] host 100.104.58.44 is not connected yet
[00:1952] Handling cmd=SMPD_CONNECT result
[00:1952] cmd=SMPD_CONNECT result will be handled locally
[00:1952] successful connect to 100.104.58.44.
[00:1952] creating connect command for left node
[00:1952] creating connect command to '100.104.40.33'
[00:1952] posting command SMPD_CONNECT to left child, src=0, dest=3.
[00:1952] host 100.104.36.14 is not connected yet
[00:1952] Handling cmd=SMPD_CONNECT result
[00:1952] cmd=SMPD_CONNECT result will be handled locally
[00:1952] successful connect to 100.104.36.14.
[00:1952] host 100.104.40.33 is not connected yet
[00:1952] Handling cmd=SMPD_CONNECT result
[00:1952] cmd=SMPD_CONNECT result will be handled locally
[00:1952] successful connect to 100.104.40.33.
[00:1952] posting command SMPD_COLLECT to left child, src=0, dest=1.
[00:1952] posting command SMPD_COLLECT to left child, src=0, dest=2.
[00:1952] posting command SMPD_COLLECT to left child, src=0, dest=3.
[00:1952] posting command SMPD_COLLECT to left child, src=0, dest=4.
[00:1952] posting command SMPD_COLLECT to left child, src=0, dest=5.
[00:1952] Handling cmd=SMPD_COLLECT result
[00:1952] cmd=SMPD_COLLECT result will be handled locally
[00:1952] Handling cmd=SMPD_COLLECT result
[00:1952] cmd=SMPD_COLLECT result will be handled locally
[00:1952] Handling cmd=SMPD_COLLECT result
[00:1952] cmd=SMPD_COLLECT result will be handled locally
[00:1952] Handling cmd=SMPD_COLLECT result
[00:1952] cmd=SMPD_COLLECT result will be handled locally
[00:1952] Handling cmd=SMPD_COLLECT result
[00:1952] cmd=SMPD_COLLECT result will be handled locally
[00:1952] Finished collecting hardware summary.
[00:1952] posting command SMPD_STARTDBS to left child, src=0, dest=1.
[00:1952] Handling cmd=SMPD_STARTDBS result
[00:1952] cmd=SMPD_STARTDBS result will be handled locally
[00:1952] start_dbs succeeded, kvs_name: '24867516-e62e-4edb-b682-594de53e15c5', domain_name: '030367df-c5fd-4ccd-bcb6-0a4a56730821'
[00:1952] creating a process group of size 5 on node 0 called 24867516-e62e-4edb-b682-594de53e15c5
[00:1952] launching the processes.
[00:1952] posting command SMPD_LAUNCH to left child, src=0, dest=1.
[00:1952] posting command SMPD_LAUNCH to left child, src=0, dest=2.
[00:1952] posting command SMPD_LAUNCH to left child, src=0, dest=3.
[00:1952] posting command SMPD_LAUNCH to left child, src=0, dest=4.
[00:1952] posting command SMPD_LAUNCH to left child, src=0, dest=5.
[00:1952] Handling cmd=SMPD_LAUNCH result
[00:1952] cmd=SMPD_LAUNCH result will be handled locally
[00:1952] successfully launched process 0
[00:1952] root process launched, starting stdin redirection.
[00:1952] Handling cmd=SMPD_LAUNCH result
[00:1952] cmd=SMPD_LAUNCH result will be handled locally
[00:1952] successfully launched process 1
[00:1952] Handling cmd=SMPD_LAUNCH result
[00:1952] cmd=SMPD_LAUNCH result will be handled locally
[00:1952] successfully launched process 2
[00:1952] Unable to get the stdin handle.
[00:1952] stdin to mpiexec closed. sending stdin_close command.
[00:1952] posting command SMPD_STDIN_CLOSE to left child, src=0, dest=1.
[00:1952] Handling cmd=SMPD_STDIN_CLOSE result
[00:1952] cmd=SMPD_STDIN_CLOSE result will be handled locally
[00:1952] Handling cmd=SMPD_LAUNCH result
[00:1952] cmd=SMPD_LAUNCH result will be handled locally
[00:1952] successfully launched process 3
[00:1952] Handling cmd=SMPD_LAUNCH result
[00:1952] cmd=SMPD_LAUNCH result will be handled locally
[00:1952] successfully launched process 4
[00:1952] Authentication completed. Successfully obtained Context for Client.
[00:1952] Authorization completed.
[00:1952] handling command SMPD_ABORT src=1
Aborting: <g class="gr_ gr_315 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling" data-gr-id="315" id="315">
smpd</g>
on RD0003FF98EE42 failed to communicate with
child <g class="gr_ gr_316 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling" data-gr-id="316" id="316">
smpd</g>
manager
[00:1952] <g class="gr_ gr_317 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling" data-gr-id="317" id="317">
smpd</g>
manager successfully stopped listening.
Thanks,
Kiran.