locked
Sql errors causes job to not be validated RRS feed

  • Question

  • Hi,

    We have been experiencing several of these errors in the hpc head node server's event log.

    An unexpected exception occurred. For more information about this exception, see the Details tab. 

     Additional data:
     Expected to update 2 rows, but actually updated 1, for SQL command:
    SET NOCOUNT OFF;

    SET NOCOUNT ON;
    SET NOCOUNT OFF;
    UPDATE Job SET
    ChangeTime = N'2016-07-21 15:37:41.247'
    WHERE ID IN (82288,82290) AND timestamp <= 0x0000000002AE7021


    SET NOCOUNT ON;

    The scheduler was unable to commit a transaction.

    An unexpected exception occurred. For more information about this exception, see the Details tab. 

     Additional data:
     The operation could not be completed because the affected object is already in use by the scheduler.  Please try again later.

    after these errors - the task would be marked as failed and the message would be

    an unexpected exception occurred while validating the job. Please try to submitting the job later. If the problem persists, please contact your system administrator or check the HPC server event log for more details

    Do you know why this is happening? We are using HPC 2012 R2 update 4, with an external sql database. Do you have any other troubleshooting steps that we can perform. The task runs fine when it is requeued. 

    Thanks!!

    Thursday, July 21, 2016 4:57 PM

All replies

  • Hi,

    This usually means the job record has been upgraded since the time it was read, so the transaction  cannot be committed because the timestamp renewed.

    In case of this error, the current transaction should abort the modification and reload the job and perform the action again. If this rate is not high, then retry logic is the fix.

    Thanks,
    Evan

    Friday, July 22, 2016 6:59 AM
  • ok that makes sense... 

    What all could update the job record? Could adding tasks to an already running job cause that?

    Thanks!

    Friday, July 22, 2016 5:24 PM
  • There could be many reasons. The scheduler could modify the record during the job running, and the clients can also modify the record through API call.

    Thanks,
    Evan

    Sunday, July 24, 2016 6:05 AM
  • ok - can you explain what you meant by if this rate is not high? Do you mean the number of transactions? Sometimes we get this error when we only have a few jobs running... but those running jobs can have over a 100 to 400 tasks. 

    Thanks,

    Nicki

    Monday, July 25, 2016 1:53 PM
  • Hi Nicki,

    You can measure the rate by the count of this exception / total operation (transactions). We don't have a criteria for the rate, but we know this could happen, and not in a high rate.

    Thanks,
    Evan

    Thursday, July 28, 2016 8:00 AM