none
Windows HPC : JobHistory

    Question

  • Hi all,

    My issue concerns 'HPC Job History' & using 'Get-HpcJobHistory' I can't seem to be able to retreive info on 'consumed resources' such as :

    -      ‘core’

    -      ‘socket’

    -      ‘node’

    Is there any way to get all this information ?

    Thx in advance for any help

    Regards

    Chanh
    Tuesday, January 31, 2012 4:33 PM

All replies

  • Hi

    You may use :"job view [job ID]" to get info about executed job ,as below:

    >job view 186283
    Id                                 : 186283
    State                           : Failed
    Name                           : testjob1
    Project Name               :
    Owner                         : Administrator
    Template                     : Default
    Priority                        : Normal
    Resource Request       : 2-10 cores
    Type                            : Batch
    Node Groups                     :
    Requested Nodes                 :
    Allocated Nodes                 : NODE01,NODE02
    Current Allocation              : 6 cores
    Submit Time                     : 2012/02/07 17:33:40
    Start Time                      : 2012/02/07 17:33:40
    End Time                        : 2012/02/07 17:36:20
    Elapsed Time                    : 00:00:02:39
    Wait Time                       : 00:00:00:00
    Run As                          : testuser
    Pending Reason                  :
    Error Message                   :
    Task 186283.1 failed. Please check the failed task for more details on the failu
    re.
    Progress                        : 100%
    Progress Message                :
    Task Count                      : 1
        Configuring tasks           : 0
        Queued tasks                : 0
        Running tasks               : 0
        Finished tasks              : 0
        Failed tasks                : 1
        Canceled tasks              : 0

    As for socket : "netstat -anbo | more" command may help you

    Daniel


    Daniel Drypczewski

    Tuesday, February 07, 2012 9:11 AM
  • Hi Daniel,

    Firstly thx for answering.

    Using 'Get-HpcJobHistory', I obtain list of jobID which I apply to 'job view ...'.
    The result is :

       The specified Job ID is not valid. Check your Job ID and try again.

    Thx again in advance for telling what 's wrong here.

    Regards,
    Chanh
    Friday, February 10, 2012 8:54 AM
  • I think you're waiting too long to try to get the info about the job.  There are 2 databases used here: scheduler database and reporting database.  The scheduler database has very detailed information about queued, running and recently completed jobs.  A subset of the job info is moved from the scheduler database to the reporting database periodically (15 minutes).  In addition completed jobs are removed from the scheduler database every 5 days (by default - you can change this).

    So I think you are getting some info from the reporting database and then trying to go back to the scheduler database to get additional info, but the job is no longer there.  You need to get the more detailed info you need from the scheduler database within 5 days of completion, before it gets deleted.

    Chris

    Tuesday, February 14, 2012 10:39 PM
  • Hi Chris, Thx to your hints I now can have 'job view' work. My pb now is I can't seem to get info on 'allocated cores'. All I get via 'Current Allocation'  is 0 cores ... Thx again in advance for telling what 's wrong. Regards, Chanh

    
    Wednesday, February 15, 2012 3:41 PM
  • W/   'job view jobID /detailed',  I got all the info I need ie. : AllocatedCores                   : node1 5AllocatedNodes                   : node1 1AllocatedSockets                 : node1 2
    
    
    Thursday, February 23, 2012 9:46 AM