I have a C++ code using MS MPI (using Boost MPI). Usually I run it using Windows HPC Pack cluster (12 nodes, each have 32 cores). It has no problem running with one node, two nodes, four nodes. But when I try to use 12 nodes to run, it run
for some times and eventually failed (every time, not succeed once). The Error message from output like this
job aborted:
[ranks] message
[0] process exited without calling finalize
[1-383] terminated
---- error analysis -----
[0] on XXXXX
Model.exe ended prematurely and may have crashed. exit code 0xc0000409
---- error analysis -----
The output from error is not readable, something like below
A
A
s
A
s
s
e
s
r
s
t
s
A
e
i
e
A
r
o
r
A
t
n
t
s
A
i
A
A
A
i
f
o
s
o
A
s
A
n
s
A
A
A
a
s
s
n
s
A
s
A
s
s
s
i
A
A
s
A
A
s
Experts on microsoft, if you can give any suggestions on debug this, that will be great. Thanks