Windows HPC : JobHistory
-
Tuesday, January 31, 2012 4:33 PM
Hi all,
My issue concerns 'HPC Job History' & using 'Get-HpcJobHistory' I can't seem to be able to retreive info on 'consumed resources' such as :
- ‘core’
- ‘socket’
- ‘node’
Is there any way to get all this information ?
Thx in advance for any help
Regards
Chanh
All Replies
-
Tuesday, February 07, 2012 9:11 AM
Hi
You may use :"job view [job ID]" to get info about executed job ,as below:
>job view 186283
Id : 186283
State : Failed
Name : testjob1
Project Name :
Owner : Administrator
Template : Default
Priority : Normal
Resource Request : 2-10 cores
Type : Batch
Node Groups :
Requested Nodes :
Allocated Nodes : NODE01,NODE02
Current Allocation : 6 cores
Submit Time : 2012/02/07 17:33:40
Start Time : 2012/02/07 17:33:40
End Time : 2012/02/07 17:36:20
Elapsed Time : 00:00:02:39
Wait Time : 00:00:00:00
Run As : testuser
Pending Reason :
Error Message :
Task 186283.1 failed. Please check the failed task for more details on the failu
re.
Progress : 100%
Progress Message :
Task Count : 1
Configuring tasks : 0
Queued tasks : 0
Running tasks : 0
Finished tasks : 0
Failed tasks : 1
Canceled tasks : 0As for socket : "netstat -anbo | more" command may help you
Daniel
Daniel Drypczewski
-
Friday, February 10, 2012 8:54 AMHi Daniel,
Firstly thx for answering.
Using 'Get-HpcJobHistory', I obtain list of jobID which I apply to 'job view ...'.
The result is :
The specified Job ID is not valid. Check your Job ID and try again.
Thx again in advance for telling what 's wrong here.
Regards,
Chanh -
Tuesday, February 14, 2012 10:39 PM
I think you're waiting too long to try to get the info about the job. There are 2 databases used here: scheduler database and reporting database. The scheduler database has very detailed information about queued, running and recently completed jobs. A subset of the job info is moved from the scheduler database to the reporting database periodically (15 minutes). In addition completed jobs are removed from the scheduler database every 5 days (by default - you can change this).
So I think you are getting some info from the reporting database and then trying to go back to the scheduler database to get additional info, but the job is no longer there. You need to get the more detailed info you need from the scheduler database within 5 days of completion, before it gets deleted.
Chris
-
Wednesday, February 15, 2012 3:41 PM
Hi Chris, Thx to your hints I now can have 'job view' work. My pb now is I can't seem to get info on 'allocated cores'. All I get via 'Current Allocation' is 0 cores ... Thx again in advance for telling what 's wrong. Regards, Chanh
-
Thursday, February 23, 2012 9:46 AMW/ 'job view jobID /detailed', I got all the info I need ie. : AllocatedCores : node1 5AllocatedNodes : node1 1AllocatedSockets : node1 2