locked
HPC job failure email notification RRS feed

  • Question

  • Hello Folks,

    I was working on sending an email notification when a job fails in HPC scheduler. 

    I came across the attributes -NotifyOnCompletion $true -EmailAddress <emailid> and it worked when the hpc jobs completes, be it failed or cancelled or success.

    I need to send email notification only if the hpc job fails. We are running more than 1000's of jobs every day in HPC and want to see only notification from a failed job.

    Please let me know if there is a possible way to do this. Thanks!

    Regards,

    Kevin

    Monday, April 23, 2018 4:54 PM

All replies

  • You could use the email subject filter or email body filter. Create one or two of them, check the $JobState in it and customize the output. Refer to below doc.

    ------------------------------------------------------------------------------------------------------------------------

    Create EmailBodyFilter.ps1/EmailSubjectFilter.ps1 in %CCP_HOME%Bin

    -      EmailSubjectFilter.ps1 has 2 parameters $JobId and $JobState. The output of the script will be the subject of the email. If the output is empty, the email will not be sent.

    -          param(

    -              [string] $JobId,

    -              [string] $JobState

    -          )

    -         

    -          if($JobId -eq 85)

    -          {

    -              "Job $JobId is in $JobState state"

    -          }

    -           EmailBodyFilter.ps1 has 4 parameters $Cluster, $JobId, $JobName and $JobState. The output of the script will be the content of the email. If the output is empty, the email will not be sent.

    -          param(

    -              [string] $Cluster,

    -              [string] $JobId,

    -              [string] $JobName,

    -              [string] $JobState

    -          )

    -         

    -          if($JobState -eq "Running")

    -          {

    -              "This is a notification email from $Cluster.

    -              Job $JobName($JobId) is in running state."

    -          }

    -          Parameter $Cluster is the name of the cluster.

    -          Parameter $JobState could be one of “Running”, “Finished”, “Failed” or “Canceled”.


    Tuesday, April 24, 2018 3:06 AM