none
HPC2016 Job stuck in Configuring RRS feed

  • Question

  • Hi,

    After upgrading HPC 2016, we met serious Job stuck in configuring tab.

    (more 1500 jobs in configuring, and our environments is 225 nodes having 9000 cores, at that time heat map displays that less 1000 cores are running)

    One time it happens, it takes more 20 minutes to return normal state.

    So, we want to know what cause it and how to resolve it.

    If you need HPC Scheduler log files, please give related contact e-mail address to us.

    Wednesday, August 26, 2020 12:43 AM

All replies

  • Hi Ddobam,

    Which version of HPC Pack 2016 are you on? (HPC Cluster Manager -> Help -> About) If it is 5.3.6435.0, we have a known issue for scheduler hang after restart and it has been fixed in the latest QFE version 5.3.6450.0.

    • HPC Pack 2016 Update 3 QFE KB4537169 (5.3.6450) - 2/14/2020 (Download)

    Regards,

    Yutong Sun

    Wednesday, September 2, 2020 4:33 AM
  • Hi,

    We already read update reference related this known issue, so we re-install v6450 full version to head-node only.

    But, It was not effective to resolve issue.

    Should we re-install or update this version to other client server?

    Currently, server version is 6450, client version is 6435.

    And, based on analyzing HPC Scheduler log, we execute update statistics plan to HPCScheduler instance in SQL Server, and it is effective temporary.

    Can I get optimizing tips for HPCScheduler instance?


    • Edited by Ddobam Tuesday, September 8, 2020 12:24 AM
    Monday, September 7, 2020 11:59 PM