locked
MAS, API versions and DATA versions RRS feed

  • Question

  • Hello Everybody,

    1. At this time there are several ways to get data from microsoft academic

    a) through the web interface at http://academic.research.microsoft.com/

     b) by using the old MAS API (requests of the form: http://academic.research.microsoft.com/json.svc/search?StartIdx=1&PublicationContent=title,author&AuthorID=958&EndIdx=100&ResultObjects=publications&AppId=... )

    c) by using the datamarket azure API (requests of the form: https://api.datamarket.azure.com/MRC/MicrosoftAcademic/v2/...) and the web based dataset explorer.

        Do all these APIs use exactly the same data?  which is the most updated and the most reliable? 

    How often are the data updated? Are the data of (b) consistent with (c)  ?

    2. I am building a dataset (for some experiments) and I felt into the following situation:

    for the Paper ID= 50 the datamarket.azure gives 3 authors:

    PaperID  SeqID  AuthorID  Name ✕ Affiliation ✕ AffiliationID 
    50 0 410000 Sebastian Thrun
    0
    50 1 410000 Wolfram Burgard
    0
    50 2 764982 Dieter Fox
    0

    2 of them are given the same authorID. The real publication ( http://www.cs.cmu.edu/~thrun/papers/thrun.maploc.pdf ) has all three authors.

    There are a lot such conditions in the table "Paper_Author".

    The old API ( http://academic.research.microsoft.com/json.svc/search?StartIdx=1&PublicationContent=title%2Cauthor&PublicationID=50&EndIdx=10&ResultObjects=publications&AppId=...) returns 2 authors.

    So I assume that it is preferable to get data from datamarket.azure but is this  going to be fixed (soon) or in order to use the data, we have to face and solve the author identification problem ?

    3. In the old API were very useful the aggregate fields like PublicationCount, CitationCount, HIndex etc. for each author, paper etc. Now, in order to find for example the top 50 productive authors of the domain "Databases" I have to:

    a. find (and download) from   Paper_Category all the PaperIDs that belong to the specific category,

    b. find (and download)  for all these papers the Paper_Author records (this implies  thousands of requests)

    c. for all these authors compute the total number of publications

    Do I miss something? Is there a better way to find results for such kind of queries?

    Thank you all,

    Antonis



    Monday, September 15, 2014 9:37 AM