I have a job with about 700 tasks that usually run more times every day.
This morning I got the following error message : "Error from node:<node name>:Exception 'Safe handle has been closed' reported creating the task." for most of the tasks at the same time, about 2:57AM.
In the event viewer I see :
1. a warning in Windows HPC Server : "Node <node name> has no hearbeat for more than 90000 milliseconds, setting it as unreachable."
2. an error in System : "
A timeout (30000 milliseconds) was reached while waiting for a transaction response from the HpcScheduler service.
What was happened ? No log files for my tasks were created. It seems that they were no submitted by the scheduler.