locked
Random crash at start when launched from batch RRS feed

  • Question

  • Hello,

       We are currently finalizing our move from MPICH to MS-MPI. There is still one issue that prevent us from releasing a version linked with MS-MPI.

       From times to times, seemingly randomly, MS-MPI fails to start. But when launched again, it works. I do not know if this is relevant, but so far, the failure only had appear when launched from a batch (several MPI job where launched sequentially).

       I am using the "MsMpiLaunchSvc" service and the command line is like this:

    pathToMSMPI\mpiexec  -n 10 -host localhost pathToMyExe\.exe  2>&1

       The error is
       
    job aborted:
    [ranks] message
    
    [0-2] terminated
    
    [3] process exited without calling finalize
    
    [4-9] terminated
    
    ---- error analysis -----
    
    [3] on localhost
    pathToExe\hsrdf.exe ended prematurely and may have crashed. exit code 0x80000003

    Have I missed something ?


      Kind regards,

    Guillaume
    Thursday, March 3, 2016 9:31 AM

All replies

  • Hi Guillaume,

    The error message indicated the crash was within the hsrdf code and not while executing an MPI function. However, it's possible that a faulty MPI execution resulted in some invalid state that would later cause hsrdf to crash. The exit code looks like it's an exception, but I'm not entirely clear on what was happening judging from the exit code alone.

    I would suggest that you enable a post-mortem debugger to trigger a debugger break-in when the exception is raised. Enabling AppVerifier for hsrdf will sometimes provide useful information as well.

    1) You can download Windbg (Windows Debugger) by installing the Windows SDK here:

    http://go.microsoft.com/fwlink/p?LinkID=271979

    After installation, you can enable Windbg as post-mortem debugging by running Windbg and provide the flag -I (e.g. C:\Debuggers\Windbg.exe -I) on a command console

    2) App Verifier is also available as part of the Windows SDK

    http://go.microsoft.com/fwlink/p?LinkID=271979

    Open AppVerifier, add hsrdf.exe to the list of of executables that AppVerifier would track and then Save the configuration, you can just use the default settings (basics)

    When the exception happens the debugger windows will be opened, at that point you can obtain a memory dump and stack trace to see why the exception was raised.


    Thursday, March 3, 2016 3:02 PM