Hi
We have setup an environment with 7 Cisco UCS compute nodes each equipped with 4 socket 12 core CPUs installed with Windows 2012 R2 HPC update 3.
Our MPI application seems to be running fine utilizing all CPU cores but we see very different execution times running the job with different ressources. The job is doing a lot of calculations, so it is heavily reying on CPU.
To be able to have stable execution time we need to reduce cores instead of using all 48 cores. We have found that optimal setup is to reduce cores as seen below:
2 nodes 40 cores each
3 nodes 30 cores each
7 nodes 13 cores each
If I calculate the total number of cores it is around 80-100, so is there some sort of limit, that we canot find?
Further we see network trafic running on 3 nodes 30 cores is 4-500 Mbps while running same job with 40 cores each shows 0-500 Kbps. So running in a non-optimal configuration, network trafic drops significantly.
So at the moment we are not utilizing our hardware at all, so suggestions would be much appreciated.