"The system cannot find the file specified" - All jobs fail RRS feed

  • Question

  • We have a three node Cluster running Windows server 2008 r2 with HPC pack 2008 r2. After deploying, all the included diagnostic tests passed and MPI jobs run successfully.

    However something has gone wrong somewhere as all jobs now fail and the HPC job manager returns the above error for all nodes specified in the job. As far as i know this is not an issue concerning access to files or directories as the error also occurs when running commands such as 'dir' or 'echo'.

    Env variables as follows:

    CCP_DATA=C:\Program Files\Microsoft HPC Pack 2008 R2\Data\
    CCP_HOME=C:\Program Files\Microsoft Compute Cluster Pack\
    CCP_INC=C:\Program Files\Microsoft Compute Cluster Pack\Include\
    CCP_LIB32=C:\Program Files\Microsoft Compute Cluster Pack\Lib\i386\
    CCP_LIB64=C:\Program Files\Microsoft Compute Cluster Pack\Lib\amd64\
    CCP_SCHEDULER=<headnode>       //this is reported correctly I've just omitted the actual name

    Checking the event viewer simply shows duplicates of the same error reported by the job manager

    Any help would be greatly appreciated as this has rendered some of our software out of action, and save for redeploying I cannot see a solution.

    Monday, June 24, 2013 9:05 AM

All replies

  • Reinstalled HPC Pack and redeployed nodes and the issue seems to have gone. Would still be good to know what cased this though for future reference
    Monday, June 24, 2013 12:17 PM
  • I suggest to check job details ( "job view JOB_ID" and "task view JOB_ID").It looks like your job uses unknown path to a file/folder or maybe to binaries.Your env settings are all right.

    Daniel Drypczewski

    Tuesday, July 30, 2013 5:52 AM