none
Can't see linux node HPC Pack 2016 update 2

    Question

  • Hello.

    I've installed HPC Pack 2016 Update 2  Single head node (last release 5.2.6277.0) and trying add Linux node to cluster.

    Installation process finished succesfully on linux node but Linux node not display in Resourse Manager.

    My logs file:

    hpclinuxagent.log

    2018/10/02 22:32:31 The command line is: /opt/hpcnodemanager/hpcagent enable
    2018/10/02 22:32:31 The command line is: /opt/hpcnodemanager/hpcagent daemon
    2018/10/02 22:32:31 The connection string is hpc-head
    2018/10/02 22:32:31 Configure iptables to allow incoming tcp connection to 40000 and 40002.
    2018/10/02 22:32:32 HPC node manager process started
    2018/10/02 22:32:34 Daemon pid: 30411
    2018/10/02 22:32:34 HPC Linux node manager daemon is enabled

    nodemanager

    [10/02 22:32:31.650] 30417 info: Log system works.
    [10/02 22:32:31.650] 30417 info: Version: 2.3.6.0
    [10/02 22:32:31.652] 30417 info: Cleaning up zombie processes
    [10/02 22:32:31.657] 30417 info: Cleanup zombie result: Cleaning up tasks in CGroup...

    [10/02 22:32:31.658] 30417 info: Initializing GPU driver.
    [10/02 22:32:31.659] 30417 warning: Executing nvidia-smi -pm 1, error code 127
    [10/02 22:32:31.659] 30417 info: Initialize GPU ret code 127
    [10/02 22:32:31.659] 30467 info: Monitoring thread created. Interval 1
    [10/02 22:32:31.659] 30417 warning: HostsFileUri not specified, hosts manager will not be started.
    [10/02 22:32:31.659] 30417 info: Main: entering sleep loop.
    [10/02 22:32:31.660] 30464 info: Opening at https://0.0.0.0:40002, result opened.
    [10/02 22:32:31.683] 30469 info: ResolveServiceLocation> Resolved serviceLocation HPC-HEAD for SchedulerStatefulService
    [10/02 22:32:32.674] 30467 warning: Exception when querying Azure node metadata. Request canceled by user.. Remaining retry count: 4.
    [10/02 22:32:34.659] 30468 info: ResolveServiceLocation> Resolved serviceLocation HPC-HEAD for SchedulerStatefulService
    [10/02 22:33:01.749] 30469 info: ResolveServiceLocation> Resolved serviceLocation HPC-HEAD for SchedulerStatefulService
    [10/02 22:33:04.338] 30467 warning: Exception when querying Azure node metadata. Request canceled by user.. Remaining retry count: 3.
    [10/02 22:33:31.815] 30469 info: ResolveServiceLocation> Resolved serviceLocation HPC-HEAD for SchedulerStatefulService
    [10/02 22:33:36.004] 30467 warning: Exception when querying Azure node metadata. Request canceled by user.. Remaining retry count: 2.
    [10/02 22:34:01.881] 30469 info: ResolveServiceLocation> Resolved serviceLocation HPC-HEAD for SchedulerStatefulService
    [10/02 22:34:07.669] 30467 warning: Exception when querying Azure node metadata. Request canceled by user.. Remaining retry count: 1.
    [10/02 22:34:31.947] 30469 info: ResolveServiceLocation> Resolved serviceLocation HPC-HEAD for SchedulerStatefulService
    [10/02 22:34:39.336] 30467 warning: Exception when querying Azure node metadata. Request canceled by user.. Remaining retry count: 0.
    [10/02 22:35:02.013] 30469 info: ResolveServiceLocation> Resolved serviceLocation HPC-HEAD for SchedulerStatefulService
    [10/02 22:35:32.078] 30469 info: ResolveServiceLocation> Resolved serviceLocation HPC-HEAD for SchedulerStatefulService
    [10/02 22:36:02.144] 30469 info: ResolveServiceLocation> Resolved serviceLocation HPC-HEAD for SchedulerStatefulService
    [10/02 22:36:32.255] 30469 info: ResolveServiceLocation> Resolved serviceLocation HPC-HEAD for SchedulerStatefulService
    [10/02 22:37:02.319] 30469 info: ResolveServiceLocation> Resolved serviceLocation HPC-HEAD for SchedulerStatefulService
    [10/02 22:37:32.385] 30469 info: ResolveServiceLocation> Resolved serviceLocation HPC-HEAD for SchedulerStatefulService
    [10/02 22:37:34.737] 30468 info: ResolveServiceLocation> Resolved serviceLocation HPC-HEAD for SchedulerStatefulService
    [10/02 22:38:02.837] 30469 info: ResolveServiceLocation> Resolved serviceLocation HPC-HEAD for SchedulerStatefulService
    [10/02 22:38:33.109] 30469 info: ResolveServiceLocation> Resolved serviceLocation HPC-HEAD for SchedulerStatefulService

    ..........

    And many of the same messages

    Help me please to solve this problem. Thank you.

    Wednesday, 3 October 2018 2:12 PM

Answers

  • Problem was solved

    https://social.microsoft.com/Forums/en-US/6213ab68-c64c-4ee5-9dc2-119463710c0a/linux-node-does-not-appear-in-cluster-manager?forum=windowshpcitpros

    • Marked as answer by iurii_m Wednesday, 3 October 2018 2:23 PM
    Wednesday, 3 October 2018 2:23 PM