29. september 2009 14:23
Im currently running Lizard on a 32 node cluster (16 duel + 16 Quad cors) and at the start 8 of the quads dont run in the verifiaction tests till the rest have finished, this is when only four of them run and it passes the test.
in the conistancy test only the quads run and this passes, normal result is around 90% conistancy and 60GFlops
then in the Tuning process 3 of the dual wont run.
all nodes are online and have fresh reimages, execpt for the head node, which did have the intel cluster tools, but as they seemed to be creating some problems for Lizard they have been taken off for the moment.
the reason that lizard needs to be run is so the i can get results for my MSc project
any help will be greatly received
29. september 2009 22:34
When you say Lizard won't run what error or message are you seeing? Are the jobs failing to run altogether and are you getting the message saying Lizard cannot proceed? Is it possible to post a screen shot of the failure mode.
Can you elaborate on what sort of issue Intel Cluster Tools created for Lizard? In general if they are not installed in the same directory Lizard should not have been affected.
Please make sure that you are running Lizard on US-English locale head node.
2. oktober 2009 09:52
Hello, sorry about the slow reply.
There aren’t any errors messages, no other jobs fail on the cluster and everything else seems to work fine. On the heat map it shows that the nodes are not doing anything and the results from lizard come back that the entire cluster is a lot slow than what it should be.
After some more testing it seems like it is always the last 3 of the duel cores which don’t work, by this I mean if I select the duel cores 1 to 16 and the quads 17 to 32 then nodes 14 to 16 won’t work with lizard. If those 3 nodes are not included and 3 of the quads are not included then the last 3 of the duels won’t run.
Running just the duels or just the quads is fine, there are no problems and the results come back as expected.
I have reinstalled lizard to see if that changes anything and the results are the same.
The problem with the Intel Cluster Tools was that it seemed to stop MS-MPI, the Windows HPC MPI diagnostic tools failed, but programs complied with Intel MPI worked fine. Lizard didn’t work at all. The Intel Cluster tools have been uninstalled
9. december 2009 07:11RedaktørWere you able to get Lizard to function across your entire cluster yet?Did you re-image the compute nodes to remove the Intel Cluster tools to make sure they were all in a known / consistent configuration?
9. december 2009 16:39after reimaging the nodes and that not working I switched the HDD of the affected nodes (the nodes that stopped running Lizard) with nodes that worked and then its been running fine since, i have no idea why that worked, but its solved the problem
- Markeret som svar af Don PatteeModerator 15. januar 2010 18:59