locked
Unable to submit Job to One of the Computational Node in the clustrer of 20 computational Nodes RRS feed

  • Question

  • Dear Team,

    We have 20 computational Nodes out of which Head Node is not able to submit Job to one of the computational Nodes and the HN is able to trigger the Jobs in other 19 CN's.

    What could be the issue.
    Monday, March 19, 2018 7:14 AM

All replies

  • What status of that node? Whether it is reachable? Whether you can run diagnostics tests on that node?

    Qiufang Shi

    Monday, March 19, 2018 7:42 AM
  • @Qiufang Shi, yes the node status is reachable and able to run diagnostic tests on that node.
    Monday, March 19, 2018 11:38 AM
  • Failed to commit transaction to schedule job 3944 with resources CN0101,1. Error:Microsoft.Hpc.Scheduler.Properties.SchedulerException: An exception occurred while attempting to access the scheduler database System.Data.SqlClient.SqlException (0x80131904): Cannot insert the value NULL into column 'NodeId', table 'HPCScheduler.dbo.AllocationHistory'; column does not allow nulls. INSERT fails...Cannot insert the value NULL into column 'NodeId', table 'HPCScheduler.dbo.AllocationHistory'; column does not allow nulls. INSERT fails...Cannot insert the value NULL into column 'NodeId', table 'HPCScheduler.dbo.AllocationHistory'; column does not allow nulls. INSERT fails...Cannot insert the value NULL into column 'NodeId', table 'HPCScheduler.dbo.AllocationHistory'; column does not allow nulls. INSERT fails...Cannot insert the value NULL into column 'NodeId', table 'HPCScheduler.dbo.AllocationHistory'; column does not allow nulls. INSERT fails...Cannot insert the value NULL into column 'NodeId', table 'HPCScheduler.dbo.AllocationHistory'; column does not allow nulls. INSERT fails...Cannot insert the value NULL into column 'NodeId', table 'HPCScheduler.dbo.AllocationHistory'; column does not allow nulls. INSERT fails...Cannot insert the value NULL into column 'NodeId', table 'HPCScheduler.dbo.AllocationHistory'; column does not allow nulls. INSERT fails...Cannot insert the value NULL into column 'NodeId', table 'HPCScheduler.dbo.AllocationHistory'; column does not allow nulls. INSERT fails...Cannot insert the value NULL into column 'NodeId', table 'HPCScheduler.dbo.AllocationHistory'; column does not allow nulls. INSERT fails...Cannot insert the value NULL into column 'NodeId', table 'HPCScheduler.dbo.AllocationHistory'; column does not allow nulls. INSERT fails...Cannot insert the value NULL into column 'NodeId', table 'HPCScheduler.dbo.AllocationHistory'; column does not allow nulls. INSERT fails...Cannot insert the value NULL into column 'NodeId', table 'HPCScheduler.dbo.AllocationHistory'; column does not allow nulls. INSERT fails...Cannot insert the value NULL into column 'NodeId', table 'HPCScheduler.dbo.AllocationHistory'; column does not allow nulls. INSERT fails...Cannot insert the value NULL into column 'NodeId', table 'HPCScheduler.dbo.AllocationHistory'; column does not allow nulls. INSERT fails...Cannot insert the value NULL into column 'NodeId', table 'HPCScheduler.dbo.AllocationHistory'; column does not allow nulls. INSERT fails...The statement has been terminated...The statement has been terminated...The statement has been terminated...The statement has been terminated...The statement has been terminated...The statement has been terminated...The statement has been terminated...The statement has been terminated...The statement has been terminated...The statement has been terminated...The statement has been terminated...The statement has been terminated...The statement has been terminated...The statement has been terminated...The statement has been terminated...The statement has been terminated...   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)..   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)..   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)..   at System.Data.SqlClient.SqlCommand.RunExecuteNonQueryTds(String methodName, Boolean async, Int32 timeout, Boolean asyncWrite)..   at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, String methodName, Boolean sendToPipe, Int32 timeout, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry)..   at System.Data.SqlClient.SqlCommand.ExecuteNonQuery()..   at Microsoft.Hpc.Scheduler.Store.StoreSqlCommand.ExecuteNonQuery()..ClientConnectionId:038297bb-74fa-4bb4-850f-5e8564290ca4..Error Number:515,State:2,Class: 

    This is the error which are there in the logs, somehow i dont that the NodeId is being passed as Null to the DB.

     
    Tuesday, March 20, 2018 2:08 PM
  • We will take a check. Could you tell us the exact version of HPC Pack you're running? Looks like you have a bad node state in the HPC Database

    A quick test you can try is: take offline the node and delete the node from HPC Pack. And it will report back as un-approved, you can assign a node template and check whether the node will recover.


    Qiufang Shi

    Wednesday, March 21, 2018 10:19 AM
  • It is Windows HPC Update 1 version and I tried that by deleting the node from the HPC pack and reinstalled the HPC pack then the node came online. 

    I just wanted to know, why this has happened?
    Wednesday, March 21, 2018 11:40 AM
  • Hi Chandramohanreddy, Could you provide the scheduler log under %CCP_HOME%data\logs\scheduler on headnode for investigation? Chenling
    Wednesday, March 21, 2018 4:13 PM
  • We need the logs to analyze why this happens.

    Qiufang Shi

    Friday, March 23, 2018 3:14 AM