Currently using HPC Pack 2016 Update 1.
After leaving the Cluster running over a weekend we had roughly 40 SOA jobs which had varying numbers of requests each (but somewhere in the range of 200 to 70000 requests). After a few days, the Cluster Manager application became locked up, and viewing
job details threw random exceptions. The SQL Server process (which is running on the single head node) was using roughly 3 gigs of RAM. Restarting the SQL Server process allowed the Cluster Manager to become responsive again. Are there any known resource leaks
for HPC interacting with SQL Server?
The SQL Server version on the head node is:
2018-04-26 09:33:44.55 Server Microsoft SQL Server 2016 (RTM) - 13.0.1601.5 (X64)
Apr 29 2016 23:23:58
Copyright (c) Microsoft Corporation
Express Edition (64-bit) on Windows Server 2012 R2 Standard 6.3 <X64> (Build 9600: ) (Hypervisor)
This is a development HPC instance, so the topology is 1 head node, 2 worker nodes, all on an Enterprise network. The SQL Server instance is running on the Head Node.
I was wondering if there was a way to detect what is causing SQL Server's memory to slowly grow over time, and if there is any additional diagnostic information to provide.