none
MPI compatibility between HPC 2008 and MS-MPI 8.0.12438 RRS feed

  • Question

  • Hi all,

    I encountered some problems with MPI compatibility as follows. Could you give me some helps to figure out the problem?

    1. The first problem:

    - The external program was coded and compiled on windows 10 using Microsoft Visual C++ where the HPC 2008 R2 is used.

    - The external program run successfully on another workstation which has HPC 2008 R2 installed.

    - The external program run unsuccessfully on my workstation which does not have HPC 2008 R2. I instead installed the MS-MPI version 8.0.12438. The error is: "PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range". (I did code and compile my own program using MPI version 8.0.12438 and it run successfully).

    Questions:

    + Does the difference in MPI version causes the problem? and How can I solve the problem?

    + Does the same thing happen if I run a program, which is compiled using MS-MPI version 8.0.12438, on a PC installed HPC 2008 R2?

    2. The second problem.

    Due to the first problem, I tried to uninstall the MS-MPI 8.0.12438 and installed the HPC 2008 R2 for client on my PC. Then the following things happen:

    + I had a program written and compiled using Microsoft Visual C++ 2017 and it run successfully with MS-MPI version 8.0.12438. Then,

    + I uninstalled MS-MPI version 8.0.12438 and install HPC 2008 R2 for client.

    + After that, I re-open the project using Microsoft Visual C++. However, there were red underlines corresponding to function related to MPI (I changed the include directory and the library directory to the directories of HPC 2008 R2). But I can still successfully compile the program.

    + However, when I run the .exe file, there was an error related to MPI problem. It showed that it could not run some MPI functions.

    Questions:

    - Does the change in MPI version affect the operation of Microsoft Visual Studio (even I changed the directories: include directory, library directory, execution directory)?

    - How can I fix the problem? and How can I re-install the HPC 2008 R2 correctly?

    I appreciate your helps.

    Thank and best regards,

    Friday, June 9, 2017 4:01 PM

Answers

  • Hi Vinh,

    I downloaded the demo2 program there and here's my observation:

    1) It ran successfully on v8

    2) It failed with any MS-MPI version before v4.1.4174 (which was shipped along with HPC Pack 2012 Service Pack 1). I looked at our code history and this is indeed the version where we put in some changes for attribute handling during the delete callback.

    Starting with Hpc Pack 2012 R2 and MS-MPI version v4.2 you can freely upgrade MS-MPI. However, given that you're in HPC Pack 2008 R2 I think the only way to make this work would be as follows:

    1) Uninstall the MS-MPI version that comes with HPC Pack 2008 R2. You also need to remove msmpi.dll from %windir%\system32 and %windir%\syswow64

    2) Download HPC Pack 2012 Service Pack 1 (https://www.microsoft.com/en-us/download/details.aspx?id=39962). You will want to download the file HPCPack2012SP1-Full.zip and extract it. Under the release\setup folder you will see two files: mpi_x64.msi and mpi_x86.msi - you will want to install those two msi to get MS-MPI v4.1.4174. This version of MPI should work with HPC Pack 2008 R2 and should also have the change to work with PETSC. Let me know if you run into issues.

    Anh


    Tuesday, June 13, 2017 9:55 PM

All replies

  • Hi - HPC Pack 2008 R2 does not support MS-MPI upgrade scenario, which means it will only work with the MS-MPI version that ships with HPC Pack 2008 R2. Starting from HPC Pack 2012, they do support upgrading MS-MPI. It's free to download and upgrade to HPC Pack 2012 R2 and if it is possible I would recommend that. After upgrading HPC Pack, you can independently update MS-MPI to v8. Here's the procedure to upgrade MS-MPI in HPC Pack 2012 or later

    1) Download the setup file for MS-MPI (MSMPISetup.exe) and place it in a share accessible for the compute nodes and the headnode. If you are using Azure Burst, you might do the upgrade in two steps (i.e., upgrade MS-MPI on the headnode, then upgrade MS-MPI on the compute nodes). Run the following steps

    2) Stop the HPC Pack msmpi service on the nodes that you are upgrading. You can use clusrun. E.g., clusrun /nodegroup:ComputeNodes net stop msmpi

    3) Update the MS-MPI version. E.g., clusrun /nodegroup:ComputeNodes \\path_to_folder\MSMPISetup.exe -force -unattend

    4) Restart the msmpi service: clusrun /nodegroup:ComputeNodes net start msmpi

    If you cannot upgrade to HPC Pack 2012 R2 and you want to restore the correct MS-MPI version in HPC Pack 2008 R2, you will need to do the following

    1) stop the msmpi service (net stop msmpi) on the node(s) that you want to restore

    2) uninstall the existing version of ms-mpi. You would also need to uninstall the msmpisdk.msi if you did install it earlier

    3) delete the %windir%\system32\msmpi.dll and %windir%\syswow64\msmpi.dll

    4) install the correct MS-MPI version that comes with HPC Pack 2008 R2. In this case you can reinstall the HPC Pack 2008 R2 Client setup

    Anh

    Friday, June 9, 2017 10:27 PM
  • Thank Anh.Vo for your helpful instruction.

    I re-installed my HPC 2008 package as your suggestion. However, the problem persists. Actually, I think it is related to the version of the MSMPI instead of the installation of MPI. The version on my workstation is HPC 2008 MPI v.3.4.4169 but that on the other workstation is HPC 2008 MPI v.3.2.3716. The difference may derive from some deprecated functions.

    I will find HPC 2008 MPI v.3.2.3716 and install it. I will let you know later if this will solve my problem.

    Best regards,

    Monday, June 12, 2017 6:29 AM
  • Hi,

    I reinstalled the MPI package on my computer to have the same version with that on the other workstation (v3.2.3716). I installed 3 files: HPCClient_x64.msi, mpi_x64.msi, sdk_x64.msi. In addition, I cannot use the command: "net stop msmpi" which is informed by the following message: "The service name is invalid.". Therefore, I just did from step 2-4.

    The problems persist as follows:

    1. When I run an external program:

    [0]PETSC ERROR: ------------------------------------------------------------------------
    [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
    [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
    [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
    [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
    [0]PETSC ERROR: to get more information on the crash.
    [0]PETSC ERROR: --------------------- Error Message ------------------------------------
    [0]PETSC ERROR: Signal received!
    [0]PETSC ERROR: ------------------------------------------------------------------------
    [0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 3, Wed Aug 29 11:26:24 CDT 2012
    [0]PETSC ERROR: See docs/changes/index.html for recent updates.
    [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
    [0]PETSC ERROR: See docs/index.html for manual pages.
    [0]PETSC ERROR: ------------------------------------------------------------------------
    [0]PETSC ERROR: Cem3Dc.exe on a Win named . by Unknown Mon Jun 12 09:58:54 2017
    [0]PETSC ERROR: Libraries linked from win
    [0]PETSC ERROR: Configure run at Installation_WA, TU Darmstadt, Germany
    [0]PETSC ERROR: Configure options Release, Win64, Complex
    [0]PETSC ERROR: ------------------------------------------------------------------------
    [0]PETSC ERROR: User provided function() line 0 in unknown directory unknown file

    The external program run successfully on other workstation.

    2. When I run the program compiled on my PC:

    [1]PETSC ERROR:
    [2]PETSC ERROR:
    [0]PETSC ERROR:
    Petsc_DelComm() line 415 in " "D:\TEMF\Prog\C++2015\petsc-3.3.0\src\sys\objects\pinit.c
    Petsc_DelComm() line 415 in " "D:\TEMF\Prog\C++2015\petsc-3.3.0\src\sys\objects\pinit.c
    Petsc_DelComm() line 415 in " "D:\TEMF\Prog\C++2015\petsc-3.3.0\src\sys\objects\pinit.c
    [1]PETSC ERROR:
    [2]PETSC ERROR: PetscSubcommSetTypeGeneral() line 116 in " "D:\TEMF\Prog\C++2015\petsc-3.3.0\src\sys\objects\subcomm.c
    [0]PETSC ERROR:
    PetscSubcommSetTypeGeneral() line 116 in " "D:\TEMF\Prog\C++2015\petsc-3.3.0\src\sys\objects\subcomm.c
    PetscSubcommSetTypeGeneral() line 116 in " "D:\TEMF\Prog\C++2015\petsc-3.3.0\src\sys\objects\subcomm.c
    [5]PETSC ERROR: Petsc_DelComm() line 415 in " "D:\TEMF\Prog\C++2015\petsc-3.3.0\src\sys\objects\pinit.c
    [5]PETSC ERROR: PetscSubcommSetTypeGeneral() line 116 in " "D:\TEMF\Prog\C++2015\petsc-3.3.0\src\sys\objects\subcomm.c
    [3]PETSC ERROR:
    [4]PETSC ERROR: Petsc_DelComm() line 415 in " "D:\TEMF\Prog\C++2015\petsc-3.3.0\src\sys\objects\pinit.c
    [4]PETSC ERROR: PetscSubcommSetTypeGeneral() line 116 in " "D:\TEMF\Prog\C++2015\petsc-3.3.0\src\sys\objects\subcomm.c
    Petsc_DelComm() line 415 in " "D:\TEMF\Prog\C++2015\petsc-3.3.0\src\sys\objects\pinit.c

    [3]PETSC ERROR: PetscSubcommSetTypeGeneral() line 116 in " "D:\TEMF\Prog\C++2015\petsc-3.3.0\src\sys\objects\subcomm.c

    This program run successfully when I use the MPI v8.0.12438.0.

    I looked at the line "Petsc_DelComm() line 415 in " "D:\TEMF\Prog\C++2015\petsc-3.3.0\src\sys\objects\pinit.c" where the function MPI_Attr_get is used: "ierr  = MPI_Attr_get(comm,Petsc_InnerComm_keyval,&ptr,&flg);CHKERRQ(ierr);".

    I guess the problem is caused by the deprecated MPI function MPI_Attr_get but have no idea to solve this.

    Vinh

     

    Monday, June 12, 2017 8:17 AM
  • I think we did fix one of the MPI Attribute related bug reported by PETSC a while back. It is likely that newer version of MS-MPI does not have issue with this function. Does it work on both workstations when you use MS-MPI v8?

    Is it possible for you to upgrade your workstations to use HPC Pack 2012 R2? Do you use HPC Pack to submit jobs across cluster or is it only for MS-MPI? If only for MS-MPI you can just download and install MS-MPI v8 without HPC Pack.

    Monday, June 12, 2017 9:21 PM
  • I tried for my workstation and my laptop as well (use MS-MPI v8). The external program did not work on both but it works for the other workstation (which use MPI v3.2.3716). 

    It is quite a strange behavior because I guess it should work when I downgrade the MPI from MS-MPI v8 to MPI v3.2.3716 (the program use the same PETSC version 3.3.0).

    If I can decide the version of MPI to use I would change to HPC Pack 2012 R2. Unfortunately, a series of supercomputer in my laboratory is using HPC Pack 2008 R2. The technician had bad experience when upgraded it to HPC Pack 2012 R2 and finally they downgraded back to HPC Pack 2012 R2. Therefore, I have to work with the HPC 2008 (though I prefer to work with newer version).

    Monday, June 12, 2017 9:42 PM
  • If your laptop is using MS-MPI v8 and the program does not work on the laptop the issue might be different. There are a couple ways we can approach this:

    1) If you can package your application so that I can try to reproduce it from my side I can give it a shot and see if I can reproduce and if yes, try to diagnose the issue from the MS-MPI side

    2) Have you tried reaching out to the PETSC authors? They might have seen this issue before and might have some suggestion

    Anh

    Monday, June 12, 2017 9:50 PM
  • Thank Anh.Vo for your suggestion,

    1. I package my applications as follows:

    - an external program: (https://www.dropbox.com/sh/46sufvzi6bwm5g2/AAA4JkXUlXF9Q6oQ_mKNFR-ra?dl=0)

     + To run the program: you run the file "Run_Cem3D.bat"

     + Problems: 

        + run successfully on other workstation which installed HPC 2008 R2 v3.2.3716.

        + run unsuccessfully on my workstation and laptop which installed either HPC 2008 R2 v3.2.3716 or MPI v8.0.12438.

    - demo program 1: (https://www.dropbox.com/sh/lvqxv7kqgydblly/AAASFlsoT58l9uNl94Zy4WuNa?dl=0)

    + To run the program: you run the file "run.bat" inside folder "x64\Debug"

    + The program is compiled with MPI v8.0.12438 and run successfully on my laptop which installed the MPI v8.0.12438.

    - demo program 2: (https://www.dropbox.com/sh/vb8zv33cy77upe5/AADR-chGo-NT-lAgsX0j9Deia?dl=0)

    + To run the program: you run the file "run.bat" inside folder "x64\Debug"

    + The program is exactly the same as demo program 2. The difference is that it is compiled with HPC 2008 R2 v3.2.3716. The program run unsuccessfully on my laptop which installed the HPC 2008 R2 v3.2.3716.

    2. I will try to reach the PETSC authors.

    Many thanks,

    Monday, June 12, 2017 10:53 PM
  • Hi,

    I tried the external program and it failed on my machine as well. I ran the program under the debugger and it seemed like when the error happened it was not in the context of an MPI call and thus I don't have much insights into the issue due to the lack of symbols/source for CEM3D. I would suggest contacting the developer of the external program and request some assistance.

    I will try the demo program 1/2 and will provide update later.

    Anh

    Tuesday, June 13, 2017 6:18 PM
  • Hi Vinh,

    I downloaded the demo2 program there and here's my observation:

    1) It ran successfully on v8

    2) It failed with any MS-MPI version before v4.1.4174 (which was shipped along with HPC Pack 2012 Service Pack 1). I looked at our code history and this is indeed the version where we put in some changes for attribute handling during the delete callback.

    Starting with Hpc Pack 2012 R2 and MS-MPI version v4.2 you can freely upgrade MS-MPI. However, given that you're in HPC Pack 2008 R2 I think the only way to make this work would be as follows:

    1) Uninstall the MS-MPI version that comes with HPC Pack 2008 R2. You also need to remove msmpi.dll from %windir%\system32 and %windir%\syswow64

    2) Download HPC Pack 2012 Service Pack 1 (https://www.microsoft.com/en-us/download/details.aspx?id=39962). You will want to download the file HPCPack2012SP1-Full.zip and extract it. Under the release\setup folder you will see two files: mpi_x64.msi and mpi_x86.msi - you will want to install those two msi to get MS-MPI v4.1.4174. This version of MPI should work with HPC Pack 2008 R2 and should also have the change to work with PETSC. Let me know if you run into issues.

    Anh


    Tuesday, June 13, 2017 9:55 PM
  • Hi Anh.Vo,

    Many thanks for your help.

    I installed the HPC 2012 r2 as you advised on my workstation. Then my program can run on my workstation but the external could not. When I move my program to other workstation with HPC 2008, it will not work.

    However, I think I will end this thread here because the solution might be upgrade everything to HPC 2012 R2 (also for the better future use).

    - I will have to work with my colleague who provided to external program to find possible bugs.

    - In addition, I might have a discussion with the person who is responsible for the supercomputer to see whether we can move everything to HPC 2012 R2.

    It is a complicated issue related to bugs of the old version and sometimes upgrading is the only choice.

    Again, thank a lot for your helps,

    Vinh

    Wednesday, June 14, 2017 2:04 PM
  • Regarding the external program I believe we're seeing a different issue and not the same issue. While it might appear that changing the environment helps, I don't think the environment (HPC Pack version) is the cause. You should contact the support/author of the external program to request some assistance there.

    If you run into issues installing/using HPC Pack 2012 R2 please contact us. There's a separate forum maintained by the the HPC Pack team and they are very good at providing answers for those issues

    Anh

    Wednesday, June 14, 2017 2:14 PM