none
Nodes not reporting metrics information

    Question

  • Has anybody experienced, and found a resolution to, compute nodes not receiving metric information?

    We are running a 16 node Windows Server 2008 R2 Standard with HPC 2008 R2 SP4 cluster.  All the nodes are the exact same hardware (HP BL460 G8's) in the same HP Blade chassis and have been bare metal deployed off the same node template.  Of the 16 nodes, 7 of them are not reporting any node metrics in the heat map.  They all show a X in the representative box and the metrics read NaN.

    I've tried restarting all the HPC related services, removed and readded the nodes (without doing a bare metal build), retarted the nodes and headnode all without any change.

    Any suggestions?

    Thanks

      Eric Sten

    Tuesday, August 19, 2014 1:35 PM

All replies

  • Hello Eric,

    Sorry for the delay.

    You can try restarting HpcMonitoringClient service on those not reporting CNs.

    The service collects metrics on CNs and reports the data to HpcMonitoringServer service running on HN.

    Restarting the service does no harm. It won't affect running jobs.

    Wednesday, August 27, 2014 2:58 AM
  • I apologize for the delay as well.  There is no HpcMonitoringClient service on neither the reporting nor the not reporting CNs.  The only HPC services are:

    HPC Management Service

    HPC MPI Service

    HPC Node Manager Service

    I've restarted all these services multiple times with no results.  Any other suggestions?

    Thanks!

       Eric Sten

    Tuesday, October 14, 2014 12:55 PM
  • Ok.. 2008 R2 cluster does not have Monitoring Server/Client services... They were added in 2012..

    Heat map and metric depend on Perf Counter.

    Since a part of your nodes are reporting metric data, I think your HN are fine.

    So please check, in those CNs' Perfmon, whether those counters related with HPC metric work well.

    For example, CPU and memory.

    Wednesday, October 15, 2014 2:18 AM