none
HPC SDM Service uses most of the CPU RRS feed

  • Question

  • We have an issue where HpcSdm.exe consumes a large amount of CPU even when there are no jobs running. If the server is rebooted or at minimum, the service is restarted, the % of cpu goes back to normal. Is there any reason this service would be taking up so much cpu resources?

    HPC2008 R2 - Topo #2 - Head Node is HP DL380 G6 - 8GB Ram

     

    Wednesday, October 13, 2010 10:23 PM

All replies

  • are you running the RTM version or a pre-release of hpc2008 r2?

    thanks


    pm
    Tuesday, October 19, 2010 7:11 PM
    Moderator
  • RTM build with all available updates. It looks like it happens with certain jobs more than others.
    Wednesday, October 27, 2010 1:49 PM
  • We are experiencing a similar issue.  However all of the CPU is split between HPCSDM.exe, and HPCManagement.exe.  We have had an open case with Microsoft for several weeks.  Microsoft keeps asking more questions, no answers.  Please let me know if you find any resolution.

    Monday, November 1, 2010 2:50 PM
  • Jeff - Did you hear anything back from Microsoft for a resolution?  We are having the same issue.

    Thursday, January 13, 2011 7:25 PM
  • It would probably be best if you called product support to get some direct assistance with this issue.
    Friday, February 4, 2011 9:59 PM
    Moderator
  • Are you doing any Management / Administration operations (i.e. moving nodes between groups) or having each node do something with the Head Node through powershell or managment commands (clusrun?).

     

    Thursday, February 10, 2011 3:22 AM
  • I am having the exact same issues with the exact same file(S).  Has there been any resolution to this problem?
    Saturday, April 20, 2013 7:13 PM
  • Not sure if this issue is solved.Maybe this fix is realted to your problem.

    HPC Pack 2008 R2 SP4 Fix for Excessive CPU Usage on High Core Count Machines
    http://www.microsoft.com/en-us/download/details.aspx?id=36797

     

    Daniel Drypczewski

    Thursday, May 9, 2013 1:15 AM
  • Thank you Dan.  This seems to have worked.  If I notice anything funny in the days ahead I will report back.
    Thursday, May 9, 2013 1:36 PM
  • Well,  I have  been running with the fix for a couple of days and it looks like it is NOT working.  When I restart the HpcSdm service the amount of processor it uses is on the order of about 14 percent or so.  Over the weekend it picked up to a larger number.  I'm not sure how much processor this service should be using, so I don't really know what is "normal", and what is abnormal usage.  I decided that since the service will reset if you restart it there may be some benefit to just writing a script to run in task manager daily to reset the service automatically.  The catch is that you can't just restart HpcSdm , because there are some other processes that rely on other processes.  I have identified the other processes that need to be reset concurrently, and they are listed below in the short batch file I wrote This seems to work fine.  

    start /wait sc \\<your server name here> stop HpcManagement
    start /wait sc \\<your server name here> stop HpcReporting
    start /wait sc \\<your server name here>  stop HpcSdm

    start /wait sc \\<your server name here> start HpcManagement
    start /wait sc \\<your server name here> start HpcReporting
    start /wait sc \\<your server name here>  start HpcSdm

    Craig 





    Monday, May 13, 2013 1:43 PM
  • I have the same issue on HPC 2012 SP1.


    Dave Moyle Systems Architect (FirstMac Ltd)

    Tuesday, November 26, 2013 1:09 AM
  • I have the same issue on HPC 2012 R2 (4.2.4400.0)

    CPU is constant at 25% for HpcSdm.exe

    Tuesday, November 18, 2014 1:49 PM
  • If you create a txt file and copy the code I put above into it and then save you can put that file in the root of the server where the process are running.  You can then go into TaskManager on the server and set up a task to run this quick script once a day.  I have had success doing this for over a year now.

    Craig



    Tuesday, November 18, 2014 4:30 PM
  • all management operation will be updated to database through HpcSdm service, so you need check the following things

    1, open HpcClusterManager, go to "Node Management", select "Operations" from left navigation bar,  check whether there are many operations recently, if has, check the detail info of the operations, whether there are some failed error,

    2, if no many operations recently, please check sql server traffic, if possible, please use "Sql server profile" to catch the traffic on HpcManagement database, then can check what is the request sent from HpcSdm service to sql server, which can help to identify the issue

    Wednesday, November 19, 2014 3:36 AM
  • all management operation will be updated to database through HpcSdm service, so you need check the following things

    1, open HpcClusterManager, go to "Node Management", select "Operations" from left navigation bar,  check whether there are many operations recently, if has, check the detail info of the operations, whether there are some failed error,

    2, if no many operations recently, please check sql server traffic, if possible, please use "Sql server profile" to catch the traffic on HpcManagement database, then can check what is the request sent from HpcSdm service to sql server, which can help to identify the issue

    1-there are not many operations, only 2 for today with no errors

    2-i cannot setup a sql profiler to the database

    When i restart the head node, the CPU for HpcSdm.exe is 0. After a few days the CPU starts to consume around 25% and after 2-3 weeks the  CPU is neary 70% only for HpcSdm.exe. I'm running around 10 jobs per day on that HPC.

    Untill a fix for 2012 R2 gets released, I'm running the script Craig suggested every week.

    Regards, Pier

    Monday, December 29, 2014 12:55 PM
  • I am on 2012 R2 with HPC update 3 and the problem still persists.

    The two offending processes that hog up CPU are HPCSDM.exe and HPCMangement.exe

    This is in a normal state - 1 workstation node with no jobs running at all.

    Anyone from Microsoft care to answer?

    Wednesday, June 22, 2016 4:52 PM
  • Hi, SRIRAM,

    Do you mean there is only one workstation node in your cluster, besides head node?

    Can you share more information about your cluster, so we will try to repro this issue locally,

    such as

    what is the OS version of your head node and workstation node?

    both head node and workstation node are HPC 2012 R2 Update 3, right?

    how long do you install/running the cluster?

    if possible, can you check the traffic to Sql server, can use "Resource Monitor".

    If restart HpcManagement service and HpcSdm service, how about the CPU consumed?

    Thursday, June 23, 2016 12:52 AM
  • BTW, can you run "netstat -ano" to check how many connections to port 6729, 6730, 9892 and 9893?
    Thursday, June 23, 2016 1:06 AM
  • Head node OS version=2012 R2

    Workstation node = Win 7 Enterprise

    I installed HPC pack Update 3 about 2-3 days ago; All the 'updates' are installed on Head and Workstation node. Head node has been up for about 29 hours before I noticed the CPU hog today.

    Headnode is a brokernode as well. SQL express 2014 is installed on the head node (by HPC pack update 3)


    • Edited by SRIRAM R Thursday, June 23, 2016 1:23 AM
    Thursday, June 23, 2016 1:12 AM
  • We are also having similar problems to this, those processes after a few weeks of cluster up time will be running 100% cpu.  We have doubled the cpu's on the VM Head Node to 4 now but the problem still persists.  If we restart those services it clears the cpu down but it's tricky to do this when the cluster is in use so much.

    Can anyone advise if stopping those processes will stop runs from running?  At present we don't have any downtime scheduled so tricky to test!

    Windows HPC 2012 R2 4.5.5079.0
    Windows Server 2012r2


    Thursday, December 8, 2016 10:20 AM
  • We are also having similar problems to this, those processes after a few weeks of cluster up time will be running 100% cpu.  We have doubled the cpu's on the VM Head Node to 4 now but the problem still persists.  If we restart those services it clears the cpu down but it's tricky to do this when the cluster is in use so much.

    Can anyone advise if stopping those processes will stop runs from running?  At present we don't have any downtime scheduled so tricky to test!

    Windows HPC 2012 R2 4.5.5079.0
    Windows Server 2012r2


    Take a look at this thread - https://social.microsoft.com/Forums/en-US/45943329-b0b7-4d6c-aa15-8c6b94267c19/hpcsdmexe-and-hpcmanagementexe-hogging-cpu?forum=windowshpcitpros

    MS had provided me with a 'hot fix' to address the issue.

    Thursday, December 8, 2016 6:43 PM
  • Hi, Cramid,

    As Sriram said, we already address the root cause of this issue, current we focus on develop HPC 2016 which will be published this month, next month we will start to do one general public QFE include this fix and many other fixes.

    If you want to try the private fix, you can let us know your email address or you can send mail to hpcpack@microsoft.com, we can send the private fix to you.

    Thanks,

    Yongjun

    Friday, December 9, 2016 1:06 AM