I've been performance testing a graphics rendering program - NewTek Lightwave on a 4 node cluster.
There are (3) dual-socket quad core-servers, 4GB RAM (8 cores each)
(1) dual-socket quad-core server 8GB RAM (8 cores)
(1) quad-socket quad-core server 16GB RAM (16 cores)
All are 64 Bit windows 2008 running HPCC 2008.
I've configured the software to start 1 thread per core for a series of tests to see how much RAM is being used.
If I submit a job using 4 cores to an 8 core node, it will consume all the available 4 or 8 GB RAM.
If I submit a job using 2 cores, it will consume 4GB of RAM (all the RAM on a server is 4GB, 50% on a server with 8GB)
If if submit a job using 1 core, it will consume 2GB of RAM.
I have verified the same is true on the 16 core server.
I can use all the RAM of course, but submitting more threads - all the RAM is accessible, but it hits a 2GB/thread ceiling and I am performance testing to find the sweet spot for how much RAM/thread is ideal and came across this behavior.
I am looking to determine what is limiting the RAM to 2GB per thread, is it something in the hardware, something in the OS or something in the application?