none
The operation 'Discovering the configuration of the head node' failed to run correctly. The operation was initiated by the user: SYSTEM. RRS feed

  • Question

  • Hello Together,

    We use Microsoft HPC 2012R2.

    Since months we have inside the Event Log of our Head Node a Error Entry inside the Mamagement Log with following text:

    The operation 'Discovering the configuration of the head node' failed to run correctly. The operation was initiated by the user: SYSTEM.

    This message can be found regulary sometimes more than once per minute, sometimes only 4-6 times a hour.

    In generall this message is appearing less, when our head node is fresh rebootet, but as longer he is running, the probability is higher, that we got this entry more often.

    We would be interested about the background of this activity, why is this acitvity done so often. And we are interested, what we should do, that this Error Message will not more come on our Server. Is there something, we can do, that this Message will not more happens ?

    It seems that something is going wrong.....but no clue how to solve this.

    Any suggestion how to solve this is welcome.

    Thank you very much for your support,

    best regards,

    Bobby

    Tuesday, February 17, 2015 10:36 PM

Answers

  • I guess there are duplicate member, you can open HpcClusterManager, go to "Configuration->Users",

    whether there are user with empty name (display name is sid)

    or you can go to the domain controller, to check whether there are two users using same SID.

    • Marked as answer by Bobby013 Sunday, March 1, 2015 8:14 PM
    Saturday, February 28, 2015 3:12 AM

All replies

  • Hi,

    Besides 'Discovering the configuration of the head node' , are there are other operations, such as "Updating cluster configuration" or "Updating head node configuration", if yes, please check whether has some error/warning log for those operations.

    For head node, it will check the configuration every 1 minutes, if no change found, it will do nothing, if it find something change, it will do operation as you see.

    Thursday, February 26, 2015 8:42 AM
  • Hi Yongjun,

    Thank you very much for your answer. In ClusterManager->NodeManagement->Operations there are multiple Operations. All Operations are Commited or Archived, and we have no Errors inside.

    If i check the event log of the HeadNode i have only this error cyclic messge inside.  Ther are some addtitional imformation messages, but nothing more.

    The only thing what i see, is that there are two kind of the same error message inside:

    1.) A short one:

    The operation 'Discovering the configuration of the head node' failed to run correctly. The operation was initiated by the user: SYSTEM.

    The session log for this event is as follows:

    Error 7000: An item with the same key has already been added.

    The operation can be identified by the GUID: 3fdbd627-5f84-4967-9878-1ab7e99cf375. Using this GUID a log of the operation can be obtained from the HPC PowerShell command: Get-HpcOperation -id 3fdbd627-5f84-4967-9878-1ab7e99cf375 | Get-HpcOperationLog.

    2.) A more detailed one:

    The operation 'Discovering the configuration of the head node' failed to run correctly. The operation was initiated by the user: SYSTEM.

    The session log for this event is as follows:

    Error 7000: An item with the same key has already been added.
    Info 0000: Discovering interfaces on computer CW01\LUSS021
    Info 0000: Filtering interface Loopback Pseudo-Interface 1 out (loopback or tunnel device)
    Info 0000: Fetching processor information.
    Info 0000: Updating processor information
    Info 0000: Fetching memory information
    Info 0000: Fetching operating system information
    Info 0000: Skipping WDS Validation
    Info 0000: Validating Windows Firewall
    Info 0000: Fetching firewall settings for profile PUBLIC
    Info 0000: Fetching firewall settings for profile PRIVATE
    Info 0000: Fetching firewall settings for profile DOMAIN
    Info 0000: Fetching information for the volume A:\
    Info 0000: Fetching information for the volume C:\
    Info 0000: Fetching information for the volume Z:\
    Info 0000: Checking the membership of the HPCJobAdministrators group
    Info 0000: Checking the membership of the HPCJobOperators group
    Info 0000: Checking the membership of the HPCUsers group
    Info 0000: Checking the membership of the Administrators group

    The operation can be identified by the GUID: c6980072-89a1-41ac-ab27-a446b610dab0. Using this GUID a log of the operation can be obtained from the HPC PowerShell command: Get-HpcOperation -id c6980072-89a1-41ac-ab27-a446b610dab0 | Get-HpcOperationLog.

    Why is this Error Message so many time in the event log? If i check the log for today, tehre are around 20 Entries of this types. Do we have an Issue with our Head Node ?

    best regards,

    Bobby



    • Edited by Bobby013 Thursday, February 26, 2015 10:40 AM Added Info
    Thursday, February 26, 2015 10:37 AM
  • Hi, Bobby,

    If operation 'Discovering the configuration of the head node'  failed, head node will try to do that operation in next interval (the interval should be 1 minute).

    From the error "An item with the same key has already been added", I cannot know that is the exact error.

    CW01\LUSS021 is your head node name, right? it is physical machine or virtual machine, did you do some system change these days?

    And you can find the log for HpcManagment service undr %CCP_HOME%Data\LogFiles\Management, you can open command console (run as admin),

    cd %CCP_HOME%Data\LogFiles\Management

    hpctrace parselog <logfilename>

    The log file for HpcManagement is HpcManagement_******.bin, always the last log file is empty, you can parse the file last but one,

    if there are more log around "An item with the same key has already been added", please paste them here, thanks! 

    Friday, February 27, 2015 6:47 AM
  • Hi Yongjun,

    Thank you very much for your answer. LUSS021 is our HeadNode you are right. It is virtual machine, where we did not some changes the last days.
    Unfortunatly we have this Error since a long time. (I check the log now as you described, and found out that this error happend the first time at 12. Dez. 2014)
    We noticed this only, because we had some cpu load issues on this head node.
    We figured out, that when the head node has 100% CPU Load, that exactly this message was inside the log repeatly every minute or more.
    When we detected this, we rebootet our head node and then this log "flooting" behaviour was gone, this message came only once in a hour, and the CPU Load was normal again.

    Please see also a other thread with a plot of the cpu-load: CPU Load of Head Node increase in Big Steps   

    I extracted the content of the log as you adviced around that error message:

    02/25/2015 17:50:04.342 i HpcManagement 4696 4260 [Change  ] Failed to execute change handler.  
    02/25/2015 17:50:04.342 e HpcManagement 4696 4260 [Change  ] Exception:.System.ArgumentException: An item with the same key has already been added...   at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)..   at Microsoft.ComputeCluster.Management.ComputerModel.ComputerContainsSecurityGroup.Validate(ISession session)..   at Microsoft.ComputeCluster.Management.ClusterModel.ClusterNode.ValidateSecurityGroups(ISession session, Boolean isHeadNode)..   at Microsoft.ComputeCluster.Management.ClusterModel.ClusterNode.ValidateOs(ISession session)..   at Microsoft.ComputeCluster.Management.ClusterModel.ClusterNode.Validate(ISession session)..   at Microsoft.SystemDefinitionModel.SystemManager.DiscoverChange.OnExecute(ISession session)..   at Microsoft.SystemDefinitionModel.Change.Transition(ChangeTransition stateTransition, SdmErrorCollection errors)  
    02/25/2015 17:50:04.357 i HpcManagement 4696 4260 [Model   ] Error: 7000 An item with the same key has already been added.  
    02/25/2015 17:50:04.357 i HpcManagement 4696 4260 [Change  ] Rolling back change Discovering the configuration of the head node, 1d4fa1db-f9a2-456b-8d4c-9b90a93614a2  
    02/25/2015 17:50:04.357 i HpcManagement 4696 4260 [InstSpace] Changes are being applied out of order, the current state of the view doesn't match the change being reverted 9f6bd21e-0ccc-4e2c-8a45-16cb4d410525,ae710494-a0e9-423e-a4ed-0f1aab98f5df,6!=9f6bd21e-0ccc-4e2c-8a45-16cb4d410525,1d4fa1db-f9a2-456b-8d4c-9b90a93614a2,7  
    02/25/2015 17:50:04.357 i HpcManagement 4696 4260 [InstSpace] Changes are being applied out of order, the current state of the view doesn't match the change being reverted 08a9c996-0fc5-4ad4-9b2b-89cc7ac3ec21,b5e7901c-48cd-4bdd-bb23-7a96350e89c2,8!=08a9c996-0fc5-4ad4-9b2b-89cc7ac3ec21,1d4fa1db-f9a2-456b-8d4c-9b90a93614a2,9  
    02/25/2015 17:50:04.357 e HpcManagement 4696 4260 [HpcManagement] The operation 'Discovering the configuration of the head node'  failed to run correctly. The operation was initiated by the user: SYSTEM. The operation can be identified by the GUID: 1d4fa1db-f9a2-456b-8d4c-9b90a93614a2. Using this GUID a log of the operation can be obtained from the HPC PowerShell command: Get-HpcOperation -id 1d4fa1db-f9a2-456b-8d4c-9b90a93614a2 | Get-HpcOperationLog  
    02/25/2015 17:50:04.373 i HpcManagement 4696 4260 [Change  ] Change applied  
    02/25/2015 17:50:04.373 i HpcManagement 4696 4260 [Change  ] Disposing of a modelUpdate

    The First Entry of the Log where this Error happened looks like this:

    12/12/2014 02:20:10.871 i HpcManagement 3768 5056 [AzurePerformanceMonitor] Executing AzurePerformanceMonitor  
    12/12/2014 02:20:13.027 i HpcManagement 3768 5056 [HpcManagement] Discovering model  
    12/12/2014 02:20:13.027 i HpcManagement 3768 5056 [Change  ] Commiting change  
    12/12/2014 02:20:13.027 i HpcManagement 3768 5056 [Change  ] Adding referenced instance 7e3e53d8-cfbc-447a-9938-3eebbc0c9784,467359a8-3bc7-422f-ad67-ed1b62c1478d,3 to change Discovering the configuration of the head node.  
    12/12/2014 02:20:13.027 i HpcManagement 3768 5056 [Change  ] Adding referenced instance 36cf6e1b-f6f5-4772-af32-58cde60d4c0c,bd4338c9-f230-4078-90ef-589435232c7f,3 to change Discovering the configuration of the head node.  
    12/12/2014 02:20:13.027 i HpcManagement 3768 5056 [Change  ] Executing change Discovering the configuration of the head node, 8c832a30-9425-4e58-8b47-c99a2cd98494  
    12/12/2014 02:20:13.074 i HpcManagement 3768 5056 [Change  ] Adding referenced instance 08a9c996-0fc5-4ad4-9b2b-89cc7ac3ec21,bd4338c9-f230-4078-90ef-589435232c7f,3 to change Discovering the configuration of the head node.  
    12/12/2014 02:20:13.074 i HpcManagement 3768 5056 [Change  ] Adding updated instance Administrators to change Discovering the configuration of the head node.  
    12/12/2014 02:20:13.074 i HpcManagement 3768 5056 [Change  ] Failed to execute change handler.  
    12/12/2014 02:20:13.152 e HpcManagement 3768 5056 [Change  ] Exception:.System.ArgumentException: An item with the same key has already been added...   at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)..   at Microsoft.ComputeCluster.Management.ComputerModel.ComputerContainsSecurityGroup.Validate(ISession session)..   at Microsoft.SystemDefinitionModel.SystemManager.DiscoverChange.OnExecute(ISession session)..   at Microsoft.SystemDefinitionModel.Change.Transition(ChangeTransition stateTransition, SdmErrorCollection errors)  
    12/12/2014 02:20:13.168 i HpcManagement 3768 5056 [Model   ] Error: 7000 An item with the same key has already been added.  
    12/12/2014 02:20:13.168 i HpcManagement 3768 5056 [Change  ] Rolling back change Discovering the configuration of the head node, 8c832a30-9425-4e58-8b47-c99a2cd98494  
    12/12/2014 02:20:13.184 i HpcManagement 3768 5056 [InstSpace] Changes are being applied out of order, the current state of the view doesn't match the change being reverted 08a9c996-0fc5-4ad4-9b2b-89cc7ac3ec21,bd4338c9-f230-4078-90ef-589435232c7f,3!=08a9c996-0fc5-4ad4-9b2b-89cc7ac3ec21,8c832a30-9425-4e58-8b47-c99a2cd98494,4  
    12/12/2014 02:20:13.184 e HpcManagement 3768 5056 [HpcManagement] The operation 'Discovering the configuration of the head node'  failed to run correctly. The operation was initiated by the user: SYSTEM. The operation can be identified by the GUID: 8c832a30-9425-4e58-8b47-c99a2cd98494. Using this GUID a log of the operation can be obtained from the HPC PowerShell command: Get-HpcOperation -id 8c832a30-9425-4e58-8b47-c99a2cd98494 | Get-HpcOperationLog  
    12/12/2014 02:20:13.199 i HpcManagement 3768 5056 [Change  ] Change applied  
    12/12/2014 02:20:13.199 i HpcManagement 3768 5056 [Change  ] Disposing of a modelUpdate  

    I hope this Information helps you to understand what is going on there.
    If you need more information, please give me a note, i can provide them.

    Thanky you very much for your help,

    best reagards,

    Bobby



    • Edited by Bobby013 Friday, February 27, 2015 11:11 AM Improved Readability
    Friday, February 27, 2015 11:10 AM
  • I guess there are duplicate member, you can open HpcClusterManager, go to "Configuration->Users",

    whether there are user with empty name (display name is sid)

    or you can go to the domain controller, to check whether there are two users using same SID.

    • Marked as answer by Bobby013 Sunday, March 1, 2015 8:14 PM
    Saturday, February 28, 2015 3:12 AM
  • Hello Yongjun Tian,

    You was right, there was realy a account in the list, where only the sid was displayed. With the removing of this sid entry, the error message was gone.

    Thank you very much for your support, you helped me realy a lot!

    All the best, Bobby

    Sunday, March 1, 2015 8:14 PM