locked
HPC R2: More HPC service instances running on node than there are cores RRS feed

  • Question

  • We are using the SOA model for our HPC applications.

    In HPC (v2), we found that when using the core resource allocation model, we would see a maximum number of HpcServiceHost.exe processes on a node equal to the number of cores on a node.

    In HPC R2, we are wanting this same behaviour, but when we submit a job, it can start more HpcServiceHost.exe processes on a node than there are cores on that node.

    Is this change to be expected in HPC R2? Or is this a sign of some possible misconfiguration on our cluster?

    Wednesday, January 19, 2011 9:54 PM

Answers

  • We have found the cause of this unexpected behaviour.

    We are using VMs for our HPC R2 compute nodes. We had changed the VM configuration from 4-core to 2-core and rebooted the compute nodes.

    In the HPC Cluster Manager Node Management table, it appeared HPC recognized this change as the number of cores for the nodes showed 2.

    However, in the View Job/Resource Selection window for the job running 4 service instances on a 2-core node, it still showed 4 cores for these nodes.

    After we took the nodes offline, and reprovisioned them by "assigning node template", the View Job/Resource Selection window showed the expected 2 cores--and only 2 service instances would start on the 2 core nodes!

    Sorry for any confusion this post caused!

    Thursday, January 20, 2011 6:22 PM

All replies

  • We have found the cause of this unexpected behaviour.

    We are using VMs for our HPC R2 compute nodes. We had changed the VM configuration from 4-core to 2-core and rebooted the compute nodes.

    In the HPC Cluster Manager Node Management table, it appeared HPC recognized this change as the number of cores for the nodes showed 2.

    However, in the View Job/Resource Selection window for the job running 4 service instances on a 2-core node, it still showed 4 cores for these nodes.

    After we took the nodes offline, and reprovisioned them by "assigning node template", the View Job/Resource Selection window showed the expected 2 cores--and only 2 service instances would start on the 2 core nodes!

    Sorry for any confusion this post caused!

    Thursday, January 20, 2011 6:22 PM
  • This is actually good feedback! :)  I've seen this behavior in the past, where the cores had different values depending where the information was addressed.  I had no idea that reassigning the node template would take care of the problem - thanks!
    Thursday, January 20, 2011 9:47 PM
  • The other way of having node information updated after 'hardware' change is to take node offline and wait about 5-10 minutes until the numbers get updated. If you will take node online before it's updated, update will not be performed in the online state.

    Thanks,
    Łukasz

    Thursday, January 20, 2011 9:59 PM