compute nodes became unreachable randomly!!! RRS feed

  • Question

  • Hi,
    I have a problem with my HPCC , our compute nodes became unreachable randomly and all the job that runs on those compute nodes stop working.
    Also some times this error shows up on Head node:
    Windows NT Intersite Messaging Service has stopped working
    error details:
      Problem Event Name:    APPCRASH
      Application Name:    ismserv.exe
      Application Version:    6.0.6001.18000
      Application Timestamp:    4791966a
      Fault Module Name:    ismip.dll
      Fault Module Version:    6.0.6001.18000
      Fault Module Timestamp:    4791ad8a
      Exception Code:    c0000005
      Exception Offset:    0000000000005a5d
      OS Version:    6.0.6001.
      Locale ID:    1033
      Additional Information 1:    86de
      Additional Information 2:    4200a2a0bdf4f799aa942c465bfdf13c
      Additional Information 3:    23ff
      Additional Information 4:    637e5f46119c24aa761e48e05f314b3e
    can anybody help me what should i do?!
    P.S:i have a Windows server 2008 enterprise x64 on Head node and Compute nodes. and also using hpc pack 2008.
    Best regards.
    Saturday, June 27, 2009 4:43 AM


  • Hello Ali

    On an unresponsive node take a look at the HPCManagement.log. This log is located under "C:\Program Files\Microsoft HPC Pack\Data\Logfiles". Please reply back with the errors you see from the logs.

    • Marked as answer by Don Pattee Tuesday, August 11, 2009 1:17 AM
    Friday, July 10, 2009 1:12 PM