locked
HPC Pack 2016 -- can't run 1-1 cores RRS feed

  • Question

  • I can't start a 176 task job on a node that has 176 logical processors.  There are NMProxy.exe errors thrown on the target node as shown below.  I can run the tasks if I allocate 2 cores per task.  This occurs even when the task is as simple as a ping.  To be clear, I'm creating a job in powershell, adding 176 tasks to it, and then starting it.  The head node and compute nodes are running Server 2012 R2.  Anyone have any thoughts?  

    Faulting application name: NMProxy.exe, version: 5.0.5826.0, time stamp: 0x584e96dc
    Faulting module name: KERNELBASE.dll, version: 6.3.9600.18696, time stamp: 0x59153753
    Exception code: 0xc0000142
    Fault offset: 0x00000000000ece60
    Faulting process id: 0x7c18
    Faulting application start time: 0x01d3017e8191cf26
    Faulting application path: C:\Program Files\Microsoft HPC Pack 2016\Bin\NMProxy.exe
    Faulting module path: KERNELBASE.dll
    Report Id: bf4a46c4-6d71-11e7-8128-405cfdb7e098
    Faulting package full name: 
    Faulting package-relative application ID: 


    Saturday, July 22, 2017 5:14 PM

Answers

  • Hi Jay,

    As you are trying to run 176 tasks in single noninteractive window station, desktop heap in that session is likely to become exhausted.
    This can be relieved by changing registry HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows. Find SharedSection like

    SharedSection=1024,20480,768

    and change it to like

    SharedSection=1024,20480,2048

    You can find more detail from the link Shadab provided.

    Thanks,
    Zihao





    Monday, July 24, 2017 3:30 AM

All replies

  • The tasks fail with exit code -1073741502 which when converted to hex is 0xC0000142 which I imagine is related to kernelbase.dll listed above.  I've tried swapping out older versions of NMProxy.exe from older HPC Packs (2008R2, 2012R2) and restarting all the HPC services on both head and compute nodes but it results in nearly the same errors.
    Saturday, July 22, 2017 6:06 PM
  • Hi Jay,

    The error indicates to {DLL Initialization Failed}- https://support.microsoft.com/en-us/help/184802/user32-dll-or-kernel32-dll-does-not-initialize

    It also translates to insufficient resource issue. Can you please check if the target machine is facing any resource crunch issue?

    ~shadab


    Sunday, July 23, 2017 4:35 PM
  • Hi Jay,

      You have 176 cores on one of your node right? As Shadab said, we believe it relates to resource issue, we will try to reproduce the issue at our side.

      Meanwhile, you can "under subscribe" your cores to lower number so that resource centention won't happen. (Select the offline node, right click and edit)


    Qiufang Shi

    Monday, July 24, 2017 1:35 AM
  • Hi Jay,

    As you are trying to run 176 tasks in single noninteractive window station, desktop heap in that session is likely to become exhausted.
    This can be relieved by changing registry HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows. Find SharedSection like

    SharedSection=1024,20480,768

    and change it to like

    SharedSection=1024,20480,2048

    You can find more detail from the link Shadab provided.

    Thanks,
    Zihao





    Monday, July 24, 2017 3:30 AM
  • That helps.  Do you think that key should be changed during the installation of HPC Pack 2016?  If someone is running the latest cluster software they're probably committed to running many tasks.  Is there a performance hit to changing this key or any other possible side-effect?  I'm experimenting with different values at the moment.  Thanks for your help so far.
    Monday, July 24, 2017 1:52 PM
  • There is some side effect. Increasing desktop heap size will decrease the number of desktops that can be created. This value should be changed with caution.

    More detail at User32.dll or Kernel32.dll does not initialize


    Tuesday, July 25, 2017 1:23 AM