none
exit code -1073741515 RRS feed

  • Question

  • Hello

     I am getting this error when I am executing my application on 2 nodes through the HPC Cluster Manager (mpiexec -n 2 ****.exe).

    -----------

    Task failed during execution with exit code -1073741515. Please check task's output for error details.

    -----------

    There is nothing in the output. Can you please let me know what is going wrong ?

    Thanks & Regards,

    Kunal

    Friday, August 6, 2010 1:33 AM

Answers

  • Hello Kunal,

    My suggestion is to find out in all the compute nodes, which dependent dlls are missing? You only need copy all those missing ones. It looks to me that those pgf* dlls may be the missing ones. You need double check that.

    Also, could you try to run your mpi app by just running in Headnode. For example, using command

    mpiexec -n 2 [path to your app]\your app]

    If you have all required dlls copied to the same location as your MPI app, it shouldn't give that error code anymore. Then you can validate what kind of dlls are missing in your compute nodes.

    Thanks,

    James

     

     

     

     

    • Marked as answer by Kunal Rao Tuesday, August 10, 2010 5:57 PM
    Monday, August 9, 2010 7:33 PM

All replies

  • Hello,

    The error code can be found in the ntstatus.h with the following definition:

      STATUS_DLL_NOT_FOUND
    # {Unable To Locate Component}
    # This application has failed to start because %hs was not
    # found. Re-installing the application may fix this problem.

    I guess the problem may occur during you compile your MPI application. It is hard to tell which dll is missing from just the error code. I suggest you delete all copies of your app, and do a clear compile and try again.

    If problem still exists, please give more details about the versions of compiler, msmpi.dll (I assume you use Microsoft MPI), and the dependent dlls you needed for your app.

    Thanks,

    James

    Friday, August 6, 2010 5:51 PM
  • Hi James,

     Thanks for your reply.

     I am using PGI compilers (10.5) to compile my code. The dependent dlls for the application are as follows (got it through dumpbin.exe):

     -----------

     

    Image has the following dependencies:

        PSAPI.DLL
        msmpi.dll
        pgftnrtl.dll
        pgf90.dll
        pgf90rtl.dll
        MSVCR90.dll
        pgc.dll
        KERNEL32.dll
    ---------------

    I have copied all these dlls in the //Head-Node/public folder where I have kept the executable.

    Even, then I am getting that error.

    (msmpi.dll that I am using is version 2.0.1551.0)

    For more details..here is the info from: job view JOBID /detailed

    ---------------
    c:\Program Files (x86)\Microsoft Visual Studio 9.0\VC>job view 336 /detailed
    Id:                      : 336
    Name:                    : Flash
    SubmitTime:              : 8/6/2010 9:43:36 PM
    CreateTime:              : 8/6/2010 9:43:22 PM
    StartTime:               : 8/6/2010 9:43:36 PM
    EndTime:                 : 8/6/2010 9:43:51 PM
    ChangeTime:              : 8/6/2010 9:43:22 PM
    UnitType:                : Core
    MinCores:                : 1
    MaxCores:                : 1
    MinSockets:              : 1
    MaxSockets:              : 1
    MinNodes:                : 1
    MaxNodes:                : 1
    RunUntilCanceled:        : False
    IsExclusive:             : False
    ErrorCode:               : -2147218980
    ErrorParams:             : 336.1
    State:                   : Failed
    PreviousState:           : Running
    UserName:                : HPC\Administrator
    JobType:                 : Batch
    Priority:                : Normal
    RequestedNodes:          : COMPUTE-NODE-2,HEAD-NODE
    RequiredNodes:           :
    IsBackfill:              : False
    NextTaskNiceID:          : 2
    HasGrown:                : False
    HasShrunk:               : False
    OrderBy:                 :
    TaskLevelUpdateTime:     : 8/6/2010 9:43:36 PM
    MinMaxUpdateTime:        : 8/6/2010 9:43:36 PM
    ComputedMinCores:        : 1
    ComputedMaxCores:        : 2
    RequestCancel:           : None
    RequeueCount:            : 0
    AutoRequeueCount:        : 0
    FailureReason:           : None
    PendingReason:           :
    ComputedNodeList:        : COMPUTE-NODE-2,HEAD-NODE
    AutoCalculateMax:        : True
    AutoCalculateMin:        : True
    ParentJobId:             : 0
    ChildJobId:              : 0
    NumberOfCalls:           : 0
    NumberOfOutstandingCalls: : 0
    CallDuration:            : 0
    CallsPerSecond:          : 0
    FailOnTaskFailure:       : False
    Preemptable:             : True
    ProjectId:               : 1
    JobTemplateId:           : 1
    OwnerId:                 : 3
    ClientSourceId:          : 3
    Project:                 :
    JobTemplate:             : Default
    DefaultTaskGroupId:      : 336
    Owner:                   : HPC\Administrator
    Id:                      : 336
    TaskCount:               : 1
    ConfiguringTaskCount:    : 0
    SubmittedTaskCount:      : 0
    ValidatingTaskCount:     : 0
    QueuedTaskCount:         : 0
    DispatchingTaskCount:    : 0
    RunningTaskCount:        : 0
    FinishingTaskCount:      : 0
    FinishedTaskCount:       : 0
    FailedTaskCount:         : 1
    CanceledTaskCount:       : 0
    CancelingTaskCount:      : 0
    ClientSource:            : HpcClusterManager
    OfflineResourceCount:    : 0
    IdleResourceCount:       : 0
    ReservedResourceCount:   : 0
    JobScheduledResourceCount: : 0
    ReadyForTaskResourceCount: : 0
    TaskScheduledResourceCount: : 0
    JobTaskScheduledResourceCount: : 0
    TaskDispatchedResourceCount: : 0
    TaskRunningResourceCount: : 0
    CloseTaskResourceCount:  : 0
    CloseTaskDispatchedResourceCount: : 0
    TaskClosedResourceCount: : 0
    CloseJobResourceCount:   : 0
    TotalKernelTime:         : 156
    TotalUserTime:           : 15
    MemoryUsed:              : 239772
    AllocatedCores:          : COMPUTE-NODE-2 2
    AllocatedNodes:          : COMPUTE-NODE-2 1
    AllocatedSockets:        : COMPUTE-NODE-2 1
    ProcessIds:              :
    -----------------

    Kindly let me know if you have any suggestions.

    Thanks & Regards,
    Kunal

    P.S. I am using -Bdynamic and -nodefaultlib=msvcrtd.lib flags during compiling the code. 

     

    Saturday, August 7, 2010 2:00 AM
  • Hello Kunal,

    My suggestion is to find out in all the compute nodes, which dependent dlls are missing? You only need copy all those missing ones. It looks to me that those pgf* dlls may be the missing ones. You need double check that.

    Also, could you try to run your mpi app by just running in Headnode. For example, using command

    mpiexec -n 2 [path to your app]\your app]

    If you have all required dlls copied to the same location as your MPI app, it shouldn't give that error code anymore. Then you can validate what kind of dlls are missing in your compute nodes.

    Thanks,

    James

     

     

     

     

    • Marked as answer by Kunal Rao Tuesday, August 10, 2010 5:57 PM
    Monday, August 9, 2010 7:33 PM
  • Hi James,

      Thanks a lot. That helped me resolve the issue. As you guessed those pgf* dll's were missing (checked that using the Dependency Walker).

      I initially got it working on the head node and  then compute node.

      Thank you very much !!

    Thanks & Regards,

    Kunal

    Tuesday, August 10, 2010 5:57 PM