We have a three node Cluster running Windows server 2008 r2 with HPC pack 2008 r2. After deploying, all the included diagnostic tests passed and MPI jobs run successfully.
However something has gone wrong somewhere as all jobs now fail and the HPC job manager returns the above error for all nodes specified in the job. As far as i know this is not an issue concerning access to files or directories as the error also occurs when
running commands such as 'dir' or 'echo'.
Env variables as follows:
CCP_DATA=C:\Program Files\Microsoft HPC Pack 2008 R2\Data\
CCP_HOME=C:\Program Files\Microsoft Compute Cluster Pack\
CCP_INC=C:\Program Files\Microsoft Compute Cluster Pack\Include\
CCP_JOBTEMPLATE=Default
CCP_LIB32=C:\Program Files\Microsoft Compute Cluster Pack\Lib\i386\
CCP_LIB64=C:\Program Files\Microsoft Compute Cluster Pack\Lib\amd64\
CCP_SCHEDULER=<headnode> //this is reported correctly I've just omitted the actual name
Checking the event viewer simply shows duplicates of the same error reported by the job manager
Any help would be greatly appreciated as this has rendered some of our software out of action, and save for redeploying I cannot see a solution.