locked
HPC Pack 2016 Cannot monitor resources in HPC Job Manager RRS feed

  • Question

  • Hello,

    Our cluster is running Windows HPC Pack 2016 version 5.2.9291.0. We are having trouble monitoring resources with the HPC Job Manager.

    Once connected to the head node, if we switch to the Cluster Resources in the HPC Job Manager, we see boxes in the Heat Map tab representing the nodes of the cluster however they are all crossed out. If we try to add a custom tab, we aren't able to select any metric from the pull downs in the Customize Tab window, they are all empty.

    The HPC Job Manager seems to be functioning otherwise without a problem. The Job Management functionality works fine. Also one point to note is that if I run HPC Job Manager from the headnode or one of the compute nodes, the Cluster Resources part works fine. Only on machines out of the cluster is that we are having issues.

    What should be causing this and how to fix it? Does the Cluster Resources part of HPC Job Manager require certification to work? I ask this because I have a separate issue for which I just placed a question in this support forum that also happens on computers out of the cluster but is okay on computers within the cluster.

    Thanks,

    -Michael

    Tuesday, January 22, 2019 8:45 PM

Answers

  • Yes, we confirmed this is the same issue as the related post. Please apply the simple fix and retry it.

    Regards,

    Yutong Sun

    • Marked as answer by MichaelEnders Monday, January 28, 2019 1:09 PM
    Friday, January 25, 2019 10:01 AM

All replies

  • Hi Michael,

    Is your client machine under the same network with your cluster? The heatmap view requires UDP connection to the Monitoring service on the head node. If there is any firewall blocking the connection, the heatmap could not be displayed.

    Regards,

    Yutong Sun

    Wednesday, January 23, 2019 9:18 AM
  • Hi Yutong,

    I tested a few things it seems the issue may be that something got broken between HPC Pack versions 5.2.6277.0 (from "HPCPack2016Update2-Full-v6277") and 5.2.6291.0 (from "KB4481650_x64").

    • With headnode, compute nodes and clients on version 5.2.6291.0, I'm having the issues described in the original post.
    • With headnode and compute nodes on version 5.2.6291.0, and clients on version 5.2.6277.0, I'm able to monitor the resources as expected. I still have the issue discussed here since the client doesn't have the patch applied, but otherwise I can monitor the resources.
    • If I apply the patch on the client and bring everything back to 5.2.6291.0, then the issues in the original post return.

    FYI, I have not tried the functionality with everything at the 5.2.6277.0, since I don't want to reinstall/reconfigure the headnode/compute nodes. I assume that it works the same as the case above in which I'm running both with different versions.

    A 'separate' issue also seems to have been introduced, which is related to this post. Are you able to confirm the issue on 5.2.6291.0?

    Regards,

    -Michael

    Thursday, January 24, 2019 1:47 PM
  • Yes, we confirmed this is the same issue as the related post. Please apply the simple fix and retry it.

    Regards,

    Yutong Sun

    • Marked as answer by MichaelEnders Monday, January 28, 2019 1:09 PM
    Friday, January 25, 2019 10:01 AM
  • Hello Yutong,

    I can confirm that the registry key modification suggested in the related post worked.

    Thanks for coming up with a solution in a quick manner.

    Regards,

    -Michael

    Monday, January 28, 2019 1:09 PM