none
HPC Error due to Temp Files RRS feed

  • Question

  • I spotted 25Gb of temp files from a model run on the HPC Headnode where the 50 packets were not cleared up by HPC. I suspect the user stopped the model run outside of the HPC interface. All further runs failed until I deleted these files. This problem has also presented itself when I try to run the same model many times in parallel, which I suspect gradually increased the amount of temporary files until the RAM?? or HDD?? is filled. Running the model in series does not cause any of these problems. The error mesage from HPC says that it "Failed to (open or write to) the output file for packet xx". How do I identfy the bottleneck on my system?

    I am running a grid made up several Intel Xeon X5670 2.93GHz machines with 16 physcial cores each virtualised to give 18 cores over 5 virtual machines. The headnode is allocated 2 virtual cores, 16GB of RAM and more than 200GB of HDD free. Each workstation node is allocated 2 virtual cores, 16GB of RAM and more than 200GB of HDD free. Each compute node is allocated 4

    Friday, March 8, 2013 12:27 PM