I have a cluster composed by 1 head node and 4 compute machines. I'd like to know how can I do to enable logging on each compute machines so to find in the event viewer possible error messages. Now I found events for Event Viewer -> Windows HPC Server only
on head node. It is empty on the compute nodes. Than I see that Event Viewer -> Microsoft -> HPC -> Scheduler exists only on head node. I suppose this is linked to the HPC Job Scheduler Service, that is installed on head node.
At the end is it possible to retrieve on a compute node why a task run on it failed ?
Thank you in advance for the explanation.
task failure reason can be found through the job console, so you don't need event from all the CNs. Unless it's some application level error, in which case you need to enable ETW trace on all nodes and collect the etl files. (E.g., "clusrun logman start...").