locked
MPI, C# and Executing Tasks RRS feed

  • Question

  • Hi,

    I was creating an parallel application using MPI in the C# language. I am using the C# MPI bindings available from MPI.NET developed at Indiana university. Basically I'm writing some managed code that will be deployed on SQL server 2005. How this is done is by using the CREATE Assembly command of SQL server and providing a dll file. So the problem is that since the mpi tasks are to be deployed on SQL Server I cannot use the console in anyway ( so mpirun or exec cannot be used in the traditional way nor can i run a script ). Is it possible to deploy MPI tasks using C code itself which I can embed in my classes and thus the dll?.

    I just found a way to do it... but its almost a hack and would still like to know what else I can do.

    I currently create a .exe file of my mpi code. Then I can create another class that calls this code using the System.Diagnostics.Process class. The Process class basically allows you to spawn other processes like notepad or anything else using "cmd.exe". This class is compiled into a DLL which is then entered as the managed code in SQL server. The problem is that I would now have to hardcode the location of the .exe in the dll classes. This is all very dirty. Isnt there another seamless way?

    Thanks in advance,

    Abhimanyu Aditya.


    Tuesday, November 20, 2007 12:11 AM

Answers

  • You can do this using pure mpi.net  It doesn't need the extra launching program mpiexec.exe  You can even have all your "mpi processors" be threads inside 1 process - which is probably what you want.  Check it out and let me know if it helps. 

    • Proposed as answer by Lio Monday, July 7, 2008 7:06 PM
    • Marked as answer by Don Pattee Monday, April 13, 2009 5:38 AM
    Saturday, January 19, 2008 11:50 PM

All replies

  • What is exactky the problem you want to solve ? I don't think that running an MPI application from SQLServer is a way to go. .Net classes inside SQLServer has been done to create new datatypes and functions that will run inside the SQLServer domain and process and not outside.

     

    You may think about building an other architecture, I can help you on this

     

    Xavier

    Thursday, November 22, 2007 1:40 PM
  • Hey, alas... a helping hand... thanks alot.

    Yes. The application:

    Say you have a data set in a table. You want to know for each row, which other row is most similar. How would you do this. The simplest way is...
    - read the data into memory
    - run 2 for loops comparing each row to each other row.

    Then return the results. Simple.. ofcourse this is not how i am doing it... its too inefficient ( take for example you have 10 million rows... that would mean 10 mill * 10 mill = 100 trillion comparisons. ) But for the sake of simplicity this is the problem.

    Now I want to parallelize this. Assume I have a 4 processor server with quad core processors ( 16 cores total to work with ). What I can do is distribute the 10 mill rows amongst say 10 cores such that each does the work for 1 million of the rows.

    This is the problem i'm trying to solve. The caveat is that I'm a graduate student in an MPI class and thus I decided to use MPI to solve this problem. Personally I think its quite interesting to use MPI within sql server. I used the above hack to run a simple process on my dual core desktop and the results were decent/encouraging. bot cores were upto 100% utilization. Ofcoure these are naive results based on windows task manager but there wasa clear difference between processor activity over the sequential run.

    What do you think?
    Thursday, November 22, 2007 10:48 PM
  • I think that you should separate the work from the database. MPI may be used for that, but perhaps that some SQL features could it for you, but I'm not enough expert on that. So let's try using your approcah, but in a different way.

    First you should consider to isolate your computation workload from the database itself, for multiple reason, one is to leave SQL Server enough CPU power to do its own job, a second is from an architectural point of view, an independance from the datasource itself.

     

    what you could eventually do is to query the data from your mpi application. the main MPI rank can start by querying the database to get a dataset, then create the MPI domains, and broadcast the initial set of data.

    Then on each slaves, query the database on the domain you are working on, and do your matching job.

     

    But in reality there is no relationship between each MPI tasks, because each is working on a separate subset. So maybe MPI is not the most appropriate for this, a parametric sweep job could be an easisest way to go.

     

    HTH

    Xavier

    Friday, November 23, 2007 12:29 PM
  • Yes, I guess I could do that. The idea was basically to allow users to be able to use SQL like queries to get answers.

    Anyway, do you know of a way to spawn MPI tasks from within code rather than calling mpirun on the console etc.

    Thanks again.
    Saturday, November 24, 2007 2:56 AM
  • unfortunately no. mpiexec is your friend here, but once again you may consider to change your archiecture.

    Monday, December 3, 2007 2:44 PM
  • You can do this using pure mpi.net  It doesn't need the extra launching program mpiexec.exe  You can even have all your "mpi processors" be threads inside 1 process - which is probably what you want.  Check it out and let me know if it helps. 

    • Proposed as answer by Lio Monday, July 7, 2008 7:06 PM
    • Marked as answer by Don Pattee Monday, April 13, 2009 5:38 AM
    Saturday, January 19, 2008 11:50 PM