none
HPCSDM.exe and HPCManagement.exe hogging CPU RRS feed

  • Question

  • @Microsoft,

    System is up for about a day and CPU is hogged by two processes - hpcsdm and hpcmanagement - what gives?

    I realize that there is a thread open for almost 6 years with the same issue - https://social.microsoft.com/Forums/en-US/df938330-087e-468b-ab4c-e967894e1c63/hpc-sdm-service-uses-most-of-the-cpu?forum=windowshpcitpros

    I am on 2012 R2 with all updates and HPC Pack update 3.

    I do not have to restart the services since jobs *may* be running.


    • Edited by SRIRAM R Wednesday, June 22, 2016 4:57 PM
    Wednesday, June 22, 2016 4:57 PM

Answers

  • Update:

    I had created a support case for MS to look into this.

    Microsoft provided a fix for me for now that addresses this issue and also says that they will roll the fix out to general public in the next QFE..

    Issue was in Microsoft.Ccp.Sdm.Core.dll 

    • Marked as answer by SRIRAM R Friday, August 19, 2016 6:50 PM
    • Edited by SRIRAM R Friday, August 19, 2016 6:50 PM
    Friday, August 19, 2016 6:50 PM

All replies

  • Hi, SRIRAM,

    Can you help to collect more information, run "netstat -ano" to check how many connections to port 6729, 6730, 9892 and 9893?

    if possible, please send the HpcManagement log to HPC Pack <hpcpack@microsoft.com>

    The log file is under C:\Program Files\Microsoft HPC Pack 2012\Data\LogFiles\Management,

    file name is HpcManagement_*.bin and HpcSdm_*.bin, you can send the last 3-4 files sort by name to us

    for example, HpcManagement_000026.bin, HpcManagement_000027.bin and HpcManagement_000028.bin

    and HpcSdm_000023.bin, HpcSdm_000024.bin, HpcSdm_000025.bin

    Thursday, June 23, 2016 3:19 AM
  • Hi,

    I have uploaded the requested documents to my onedrive and shared them with the email id you had asked to send it to.

    I am sharing the same here as well.

    At the time of this post, the cluster is 'idle' with no jobs and just 1 workstation node. And it's just been less than a day since i last restarted the sdm and management services..and now it's hogging the CPU.

    https://1drv.ms/f/s!AnOkuFPvV8aAivIREtXFVye_C_3SwQ

    Thursday, June 23, 2016 8:44 PM
  • SRIRAM,

    Thanks for collecting the info, I just go through the log, but has no findings, the connection to HpcManagement and HpcSdm service only from LXN-TD-JV86DX1, suppose it is the workstation.

    And we setup the similar environment yesterday, but cannot repro this issue.

    So need more information for further investigation,

    can you set the ManagementTraceLevel to 4, default is 3, 3 means info level, 4 is verbose level, so it can collect more log.

    ManagementTraceLevel is in registry, under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\HPC,

    After change the trace level, wait for several minutes, then send the log to us, thanks!

    BTW, if possible, please also catch the dump of HpcManagement and HpcSdm service, that should be helpful for investigation.

    Thanks again, it should be helpful for us to trouble shooting.

    Friday, June 24, 2016 2:28 AM
  • There was no change in the log file timestamps after  i changed the registry setting. so restarted the services. once i see a change in the log file timestamps, i will share those files.

    i have usd procdump64.exe against the two exe's you wanted against. the files have been uploaded to the one drive share i specified in my earlier post.

    • Edited by SRIRAM R Monday, June 27, 2016 2:04 PM
    Monday, June 27, 2016 1:48 PM
  • 1) Updated registry setting as recommended

    2) Restarted services

    3) It's been 24 hours and I see CPU at 100%

    4) Uploaded relevant bin files and dmp files to my onedrive shared folder here:

    https://1drv.ms/f/s!AnOkuFPvV8aAivIREtXFVye_C_3SwQ

    Please advise on next step(s).


    Tuesday, June 28, 2016 4:08 PM
  • just check the dmp file, seems on your head node, the CLR version is 4.6.1055, but for current HPC version, we officially run on .NET Framework 4.0,

    I will setup one environment to run Head node on .NET Framework 4.6, to see whether can repro this issue.

    if possible, you can uninstall .NET Framework 4.6, just keep .NET framework as 4.0.

    BTW, what is the hardware configuration for your head node, CPU cores? Memory size?

    Thursday, June 30, 2016 1:26 AM
  • 2 cores - Intel Xeon E5-2695 (2.4 GHz)

    4GB RAM

    During installation, I installed HPC pack update 3 (whereever there was an option) to D:\ drive

    I will wait for you to get back with your test results on CLR 4.6

    Thanks

    Thursday, June 30, 2016 2:44 AM
  • Update:

    I had created a support case for MS to look into this.

    Microsoft provided a fix for me for now that addresses this issue and also says that they will roll the fix out to general public in the next QFE..

    Issue was in Microsoft.Ccp.Sdm.Core.dll 

    • Marked as answer by SRIRAM R Friday, August 19, 2016 6:50 PM
    • Edited by SRIRAM R Friday, August 19, 2016 6:50 PM
    Friday, August 19, 2016 6:50 PM