Penanya
help!! Scheduler is unresponsive to job submission

Pertanyaan
-
Yesterday I didn't have any problems submitting jobs to the head node from a client PC.
Today the client just hangs when I do a job submit via a command line call after some time it finally responds with:
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
Does anyone have any ideas on this?
Thanks in advance.
Kamis, 08 Oktober 2015 16.47
Semua Balasan
-
Can you run the job submit command on the head node?
If the command can be run on the head node, please check the firewall setting of head node, whether a inbound rule "HPC Job Scheduler Service (TCP-In)" with local port 5800,5801,5969,5999 is enabled.
If the command cannot be run on the head node either, try to restart the HPC Job Scheduler service.
Jumat, 09 Oktober 2015 01.59 -
Thank you for the feedback on this.
I've done some more investigation, here are results:
1. I can submit jobs from scheduler without any issues.
2. I can ping the head node from the client without issue.
3. Looking at the firewall settings of the head node "HPC Job Scheduler Service (TCP-In)" Inbound rule is marked as private and I'm unable to edit it. It only has one port enabled, port 5970.
Should I be able to edit that rule?
Should I manually create a new rule and add all the new ports?
Thanks again for the help with this.
Luke- Diedit oleh Lsagur Jumat, 30 Oktober 2015 13.23
Jumat, 30 Oktober 2015 13.22 -
After changing configuration on the head to "do not manage firewall settings" I now see the rule in the firewall with all the ports you mentioned enabled.
I still can't connect to the head from the client PC though (although I can ping it).
Are there some outbound rules required on the client pc? currently I don't see anything in the firewall settings.
Thanks,
Luke
Jumat, 30 Oktober 2015 13.56 -
Hi,
After setup, usually our system has configured the necessary firewall rules. Please double check:
1. whether your client machine has joined the same domain as your headnode
2. whether the version of your client matches your server version
3. Whether you are using a domain account logged on your client machine
4. Is the job manager GUI able to connect to the headnode? (HPCJObManager.exe)
Qiufang Shi
Senin, 02 November 2015 02.05 -
Thank you Qiufang for all the sugg
1. yes same domain.
2. Client PC -> MS HPC PAck 2012 R2 Client Components (4.4.4864.0) Head Node -? MS HPC Pack 2012 Server Components (4.4.4864.0).
3. yes domain account
4. No -> There was a network problem or the server was disconnected. Please try connection again. Failed to connect to the following service(s) on the head node: scheduler service.
I tried on another PC on the network and I'm getting the same result.
What do you recommend as the next steps?
Thanks,
Luke
Senin, 02 November 2015 12.37 -
Looks like the configuration is okay. You may double check whether you can reach the scheduler port from the client machine, usually you can try telnet. The scheduler port is 5800, for example: telnet open hostname port
And please also check whether you've enabled IPv6, whether your DNS resolves IPv6 as default, or simply you can try to connect to the scheduler with IPv4 address directly from commandline such as job submit /scheduler:xxx.xxx.xxx.xxx hostname
And check whether you can submit job from compute nodes, this may help isolate the problem (Whether it is cluster configuration issue or the client problem)
Lastly, check whether you can firewalls on your client prevents your application from connecting to the headnode
Qiufang Shi
Selasa, 03 November 2015 01.39 -
Qiufang, we figured it out!
VNC viewer was installed on the headnode, it uses port 5800, so we had to get that service off to open up that port. After getting that change everything is functioning well.
Thanks for your help.
Luke
- Disarankan sebagai Jawaban oleh qiufang shiMicrosoft employee Sabtu, 07 November 2015 02.53
Jumat, 06 November 2015 17.09 -
Changed the WCF service to just run as local system account instead of specifying it in the service panel.Minggu, 10 November 2019 18.18