locked
submitting job RRS feed

  • Question

  • I have notice a difference when submitting a job with in Windows HPC 2016 and doing it directly on the compute node.

    my compute node is a RedHat 7.5 and i am using NAMD and the model i am using is apoa1 model to do the benchmarking 

    i notice that running directly i get 25ns/day and if i use the HPC scheduler i get about 18ns per day

    i am only running it on one node as that is all i have currently. any ideas 

    Tuesday, September 11, 2018 4:06 PM

Answers

  • Hi,

      Having CGROUP disabled is added in nodemanager 2.3.4.0, this is in HPC Pack 2016 Update 1 with QFE: https://www.microsoft.com/en-us/download/details.aspx?id=56964 , if you're already using Update 1, you could follow the QFE doc to applying the updates for your headnode and linux node

    2.3.4.0

    ================================================================

    1. Fixed a bug that task would fail when cgroup is not enabled


    Qiufang Shi

    • Marked as answer by DCSpooner Thursday, September 13, 2018 8:29 PM
    Wednesday, September 12, 2018 11:53 PM

All replies

  • Hi,

      We believe this is caused by CGroups. If you disable cgroup, the performance will get back. Thus you could try

    - go to the common.sh: and comment the second line below, make us thinks that cgroup is not installed and run the test again.

    CGInstalled=false

    command -v cgexec > /dev/null 2>&1 && CGInstalled=true

      CGroups is used to isolate different job/tasks on the linux nodes so that they could be limited to the resource they are assigned (Mainly CPU resource). You may disable cgroups but as a result, your own task need responsible for resource usage.

      And also check your cgroups version, if it is v1, please try v2 and see whether it makes difference (cgroups v2 has performance improvements)


    Tuesday, September 11, 2018 11:15 PM
  • first how do i check if i have v1 or v2? if i have v1 how do i update to v2

    2nd when i comment out the line and send another job i get an error "/usr/bin/sudo: /usr/bin/sudo: cannot execute binary file"

    task failed with exit code 126

    Wednesday, September 12, 2018 3:12 PM
  • Hi,

      Having CGROUP disabled is added in nodemanager 2.3.4.0, this is in HPC Pack 2016 Update 1 with QFE: https://www.microsoft.com/en-us/download/details.aspx?id=56964 , if you're already using Update 1, you could follow the QFE doc to applying the updates for your headnode and linux node

    2.3.4.0

    ================================================================

    1. Fixed a bug that task would fail when cgroup is not enabled


    Qiufang Shi

    • Marked as answer by DCSpooner Thursday, September 13, 2018 8:29 PM
    Wednesday, September 12, 2018 11:53 PM
  • is there going to be an update 2 or Windows HPC 2019 released?
    Thursday, September 13, 2018 8:29 PM
  • HPC Pack 2016 Update 2 will be released this month. I will have a post in this forum and point to the download page and the doc site.

    After Update 2 release, we will start working on HPC Pack 2019 which we want to release end of next year.

    If you any specific requirements, please reach us through hpcpack@microsoft.com


    Qiufang Shi

    Friday, September 14, 2018 2:50 AM