HPC scheduler can't allocate cores as job requested. RRS feed

  • Question

  • We noticed hpc can't allocate core to the job under certain condition. We think it is hpc scheduler issue and could affect other hpc users not just us. The following test can be easily done to reproduce the issue.

    Test platform
    HPC 2016, windows server 2016 with all recent update on a compute node with 72 cores (4 sockets x 18 cores/socket). This issue also happens on hpc2012R2 update3, windows server 2012R2 on any system over 64 core.

    Test work loader
    We treid both consume.exe from windwos 2003 resoruce kit and intel linpack. Both give us same result.  You can try any command which  can make core busy.

    Test procedure
    1. submit a job request 18 cores.
    2. Wait till 18 core job running and processor busy on first 18cores (from task manager you can see the usage).
    3. submit a job request 36 cores.

    Expected result
    we should see 75% usage of total cpu usage.

    Actually result
    only see 50% usage of total cpu usage.

    Possible reason.
    MS Windows will divide cores into 2 kgroups if system has total core over 64. On a system with 72 cores, group 1 contains 36 cores and group 2 contains 36 cores as well. You can verify this by task manger group drop down menu. In my test, the first 18-core job will use half of total cores in group 1. For test step 3, hpc will try to allocated the next 18 cores on group 1 and first 18 cores on group 2 for 36core job. Somehow this across group core allocation fails and only the 18core on group 1 can be allocated. This is why test workloader only use 50% of not 75%. If we changed sequence, submit 36core job first then 18core job, the cpu usage will be 75% because there is no job hpc will try to allocate core across the group.

    Could HPC developper give it a try?

    • Edited by lijun1234 Wednesday, July 26, 2017 3:09 PM
    Monday, July 24, 2017 5:43 PM

All replies

  • Hi Jun,

      Thanks for reporting this issue. We will have a try at our side and update this thread when we have findings.

    Qiufang Shi

    Tuesday, July 25, 2017 4:15 AM
  • Hi Jun,

      We currently don't have servers with 64 cores for compute purpose (The ones with more cores are now hyperV host which limits the host OS to 64 logical cores as max). Thus if possible, could you reach us through "HPCPack@microsoft.com" so that we can provide you a private bits with more logs help us identifying the issue? Otherwise, we have to wait until there is hardware resource for validating the issue.

    Qiufang Shi

    Wednesday, July 26, 2017 1:39 AM
  • Thanks for the quick responding. Will do.
    Wednesday, July 26, 2017 2:59 PM