locked
My Application Fails with ERROR_NOT_ENOUGH_MEMORY when submitting a Job with JobUnitType set to Core. RRS feed

  • Question

  • Hello Together,

    i have following Problem with my Microsoft HPC (SP2).

    I want to submitt a job, where i start a application, which is programmed ourself.

    This application runs well on the HPC Cluster, when we run the Application with the JobUnitType set to Node.

    When i switch the JobUnitType to Core, and Create a Job, which has 3 Tasks with the same Application inside, it works also very well. When Creating a Job with 4 Tasks inside (all with same Application) and this Tasks will start on the Same Node (JobUnitType=Core) in parallel, one Application will fail. i belive currently the Application which starts as last one will Fail. When i check teh Error Log of the Application, i find the Error Code "8" which was returned from "GetLastError()", and this Happens, when the Application wants to Create a Dialog. But please Remember, the Application runs fine with different JobUnitSettings, or starting only 3 Tasks in parallel on one node.

    Following Details i can give you here:

      • The Application is programmed in Native C++ (unmanaged code).
      • The Application is a 32 Bit Application.
      • The Workstation Nodes have installed Win7 64 Bit.
      • We are using Microsoft HPC SP2.

      Additional to this, we found following out: When we run the Application on this Workstation Node in a Remote Desktop multiple Times (to be precise here, i tested the Application 4x parallel and 8x parallel) it worked well.

      So it seems for me, that Microsoft HPC does something different when starting multiple times the Application at one Workstation Node.

      Currently i would have following questions:

      1. How does Microsoft HPC start a Application on a computing Node? Is it realy the same like a remote connection with a RemoteDesktopConnection to a second pc? (LogOn User ... Desktop will be created....Start Application) Or is this different? What happens here realy in detail, when Microsoft HPC starts my Application?
      2. What do i have to change in my Application, so that this Application is able to run more than 3 times in parallel on a workstation node?
      3. Is there a possibility to change something on my Workstation Node (Win 7 machine) (registry key,......) so that my Application is able to run more than 3 times on a Workstation Node?

    It would be great, if somebody could give me some hints, what i should do to get the Application running, either in changing someting inside the Application, or changing something in our complet Environment (HPC, OS on Workstation Nodes, etc., ...)

    Thanks everybody in advance,

    Bobby

    Friday, October 5, 2012 6:29 PM

Answers

  • Hi Bobby,

    According to http://msdn.microsoft.com/en-us/library/windows/desktop/ms681382(v=vs.85).aspx error code 8 means:

    ERROR_NOT_ENOUGH_MEMORY
    8 (0x8)

    Not enough storage is available to process this command.

    Maybe your application, when started with multiple instances in parallel quickly, gets some of its instances starved? When trying to reproduce the issue manually, did you start your app instances one by one? You can try starting multiple processes quickly using 'start' command and simple batch script. CMD one-liner will look something like this:

    for /L %i in (1,1,8) do start yourapp.exe

    Regarding your question about how Windows HPC starts proceses on compte nodes - by default it is different than running things with remote desktop. Processes are started in session 0 and some of the applications with GUI may encounter problems because of that. It is however possible to start your tasks in previously created interactive session or even use automatically created console session for that purpose. For more information please take a look at this: http://technet.microsoft.com/en-us/library/gg247477(v=ws.10).aspx and http://technet.microsoft.com/en-us/library/dd420457(v=WS.10).aspx

    Please let me know if you have any further questions.

    Regards,
    Łukasz

    • Marked as answer by Bobby013 Friday, March 22, 2013 7:36 PM
    Wednesday, October 10, 2012 4:05 PM