locked
Cannot run parametric sweep job RRS feed

  • Question

  • Hi all,

    I am trying to test a setup for a cluster using a parametric sweep job. The job has several tasks, each task is a small "hello world" type executable (it writes some text to a separate file for each task to a shared directory on the network). The problem I have is that all my attempts are given the "Queued" status with the following message:

    The job is pending: Not enough available resources.

    Background:

    The network topology is Enterprise with 3 nodes:

    - Header node (Win Server 2012) - 8 cores

    - 2 Compute Nodes (each a virtual machine on a different physical machine - Win Server 2008) - 4 cores each

    In Node Management all nodes are Online, Node Health OK.

    In the default job template I selected the default resource pool (with with of 100 - so all 16 cores are enabled).

    I can successfully run Single-Task jobs with the same task as above (write some text to a file). I can submit and execute this on each of the 3 nodes available. So all nodes should be accessible for jobs. However if I try to modify job resources (set the number of cores from Auto to a any number larger than 1) this will also fail, with the same message as the the parametric sweep task. It seems that the job scheduler cannot find and allocate the needed resources.

    I have tried every possible configuration that I see in the job submission interface and they have all failed. So I am guessing either I am doing something terribly wrong, or there is something wrong with the configuration of my HPC cluster.

    Any ideas to make this work are greatly appreciated.

    Regards,

    Valentin

    Thursday, July 30, 2015 10:45 AM

Answers

  • The issue has gone away by disable resource pool and re-enable resource pool

    Qiufang Shi

    Tuesday, August 4, 2015 1:47 AM

All replies

  • Hi,

      Can you help export the job and the related job template to xml files and send to us through "hpcpack@microsoft.com"? We can take look at your configuration.

    Qiufang


    Qiufang Shi

    Friday, July 31, 2015 2:08 AM
  • Thanks for the answer! Configuration files have just been emailed to the above mentioned address.
    Friday, July 31, 2015 1:09 PM
  • The issue has gone away by disable resource pool and re-enable resource pool

    Qiufang Shi

    Tuesday, August 4, 2015 1:47 AM