Multiple independent tasks using only one core of cluster RRS feed

  • Question

  • Hi,

    We are experiencing some difficulties with a Cray CX1 cluster running Windows HPC 2008 R2.

    We had a problem with de Domain Controller which resulted in us having to partially rebuild the cluster as the trust relationships were lost after the problem.

    All seemed to be well with the restored system until we started submitting jobs, at which time we observed that if we submitted a job with for instance 16 independent tasks, the scheduler no longer distributed all the tasks among available nodes and cores, as happened before this problem occurred.

    Since we are very basic HPC users we don't have a clue what is wrong with the new configuration and need some pointer to what might be the origin of the problem.

    Can anyone give us any clues ?




    • Edited by Luis A Cruz Friday, November 2, 2012 10:58 AM
    Friday, November 2, 2012 10:57 AM

All replies

  • Notice that if you run 16 tasks on 4 nodes ( say, 1 node has 4 cores) and the task runtime is short the tasks may run only on one or two nodes.Submitting a task to scheduler takes some time.If the previous task is finished within this time the next task will most likely occupy resources used by the previous task giving the impression that only 1-2 nodes actually worked.

    What are the parameters of your task?

    Try this:

    1.Set long time for the tasks

    2.Assign fixed nodes to tasks ( job add <jobId> /requirednodes:)

    If the tasks are owned by different users maybe after DC resinstallation some of the user credentaials stored in by scheduler are missing.Confirm that you gave persmission to run jobs by your users (Cluster Manager/Configuration/Users) , re-add your users

    Daniel Drypczewski

    Tuesday, November 6, 2012 6:16 AM