CPU utilization problem when running parallel Fluent on Windows HPC 2008R2 RRS feed

  • Question

  • Hello,

    I have just reinstalled Windows HPC 2008R2 on a small cluster costituted of one head node and nine compute nodes (8 cores each), where we are running some CFD codes such as Fluent or Star-CCM+.

    When I am running some benchmarks with both codes, I get a good scalability up to 16 cores (on 2 nodes), but the performance  drops suddenly when adding one more core. Moreover when monitoring the task manager, I figured out that the CPU utilization on the compute nodes was varying between 25% and 50% when I submitted a computation on three nodes while it was 100% on two nodes (16 cores).  I have to precise that in a previous installation of Windows HPC 2008R2 on the same nodes, I could reach good scalability when increasing the number of nodes.

    I would be very grateful if someone had any idea on the source of the problem. Thank you!


    Saturday, May 5, 2012 8:24 PM

All replies

  • Hi,

    This looks interesting, as Fluent is one of the applications we test in each release and the scalability is very well. Below are some of the things that I can think of.

    1. Is the 3rd node identical to the other two? Do they have the same hardware? Are they connected to the same network switch?

    2. Some fluent models have better scalability on 2^n number of cores, could you try your scenario on 4 nodes instead of 3?

    3. What version of HPC are you using? Are you running with the latest version?

    4. What's the power setup on your cluster? Could you do "powercfg -l" and make sure your nodes are on "high performance" mode?

    5. Are you running the fluent bits from a network share? Did you install fluent on every node?


    Monday, May 7, 2012 5:39 PM
  • Hi Michael,

    Thank you for your reply. Here are the answers to your questions. Unfortunately, new series of tests didn't show any change.

    1) The nine nodes in the cluster are all identical with the same hardware (based on dual Xeon E5440) and their sytem is installed from bare metal from the same template. All compute nodes and head node are connected to the same switch.

    2) When I launch a computation on  4 nodes or more, I unfortunately obtain performances even lower than for three nodes. I observe the same behavior concerning the CPU utilization.

    3) I am using the HPC Pack 2008 R2 SP3.

    4) The power setup was set to the default mode. Switching it to "high performance" mode on all nodes didn't change the previous situation.

    5) I run Fluent from a network shared folder on the head node. I followed all the instructions given by ANSYS dealing with running Fluent on Windows HPC (http://www.ansys.com/About+ANSYS/Partner+Programs/Complete+Windows+Support+for+ANSYS/ANSYS+FLUENT+Installation+FAQs)

    Following your instructions, I also lead new tests after installing ANSYS Fluent on four nodes but didn't observe any change.

    As I precised in my previous post, Fluent indeed scaled really well on these nodes in a previous installation of Windows HPC, although the network is GigE. Concerning the network performance, the HPC built-in MPI diagnostics and Fluent both report values between 30-35 µs for the latency, and around 105 MB/s for the bandwidth, which are expected values for GigE.

    Thank your for your help, do you see anything else to try?


    Tuesday, May 8, 2012 3:00 AM
  • Hi Antoine,

    When you run the application on previous HPC releases, did you run the same model? Note that some models have better scalability and some don't. Could you try a different model?


    Tuesday, May 8, 2012 3:51 AM
  • Thank you Michael,

    I ran the benchmarks files from ANSYS Fluent (http://www.ansys.com/Support/Platform+Support/Benchmarks+Overview/ANSYS+Fluent+Benchmarks) and a personal case, covering simple case of external aerodynamics at low Reynolds numbers to more complicated turbulent cases (and they also represent the main models used on the cluster). They scaled well in a previous installation but not anymore when I use more than two nodes.

    Are there some further settings to make full use of the processors on more than two nodes? Thank you very much for your help.


    Tuesday, May 8, 2012 7:52 PM
  • Antoine,

    Which version of ANSYS Fluent are you using?


    Tuesday, May 8, 2012 8:11 PM
  • I am currently using version 13.


    Tuesday, May 8, 2012 9:40 PM
  • And in your previous runs, you also used Version 13? There might be a compatibility issue since when HPC sp3 shipped we were testing on Fluent 14. Do you have Fluent 14 that you can run against?

    Thanks for staying patient, your problem is quite unique but I don't have Fluent 13 to repro your issue.


    Wednesday, May 9, 2012 1:31 AM
  • I ran my previous tests with Fluent 13 too. I have access to Fluent 14 though it is not installed on the cluster yet. I will try this solution and hope it can solve the problem.

    Thank you for your help!


    Wednesday, May 9, 2012 2:34 PM
  • Hi Michael,

    Following the test I led with Fluent 14 on the cluster with the same cases, I get the same result as with Fluent 13, with the same drop in performances as observed with Fluent 13 for more than two nodes. As with Fluent 13, on three nodes, all the cores on each compute nodes are equally used but at around 30% of their possibilities while on 2 nodes they are used at 100%.

    Is there any other setting I can change to solve this problem?

    Thank you for waiting for the results.


    Friday, May 11, 2012 5:05 PM
  • Hi Antoine,

    It's really strange, because Fluent 14 is one of the scenarios we test before shipping. When your CPU utilizations drop, your total time to finish a model also drops I guess? So the more machines you have, the longer it takes? Could you also check your firewall settings?


    Friday, May 11, 2012 5:48 PM
  • Hi Antoine,

    This scenario is interesting enough that we want to work closer with you on it. Please email me at micman@microsoft.com so we can start a thread with our perf lab guys.


    Friday, May 11, 2012 9:25 PM
  • Hello

    you can see a good tutorial in this link about parallel processing by ansys fluent


    Best regards

    Reza Amini

    Wednesday, November 30, 2016 2:23 PM