FAQ: How to make your site indexed by Microsoft Academic Search

Sticky FAQ: How to make your site indexed by Microsoft Academic Search

  • 2010年9月27日 9:35
    擁有者
     
     

    Microsoft Academic Search uses a focus crawler to fetch data from the Internet. The following are some tips for website administrators that can help websites be indexed by Academic Search easily and quickly.

    Crawler name and IP

    The Microsoft Academic Search crawler is called “librabot”. The user agent string in the http request is “librabot/2.0 (+http://academic.research.microsoft.com/)”. Our crawler’s IP ranges are 219.142.53.0/25 ,202.96.51.128/25 and 131.107.65.248 . The http request from our crawler looks like this:

    GET http://www.microsoft.com/ HTTP/1.0

    Host: www.microsoft.com

    User-Agent: librabot/2.0 (+http://academic.research.microsoft.com/)

    Accept: text/html, text/plain, text/xml, application/*

    Accept-Encoding: identity;q=1.0

    From: librabot@microsoft.com

    Please make sure that your website can be accessed by the IP and crawler name.

    Academic search follows the robots.txt protocol. If your website has robots.txt, please make sure the crawler is not blocked by the file; set the crawldelay to a small value if you want to control the crawler access frequency.

    Sitemap protocol

    Our crawler supports sitemap protocol. If you want your important content to be indexed in quantity and quickly, you can write the URL of the important content into the sitemap; please be sure to update it when the content in your website changes.

    Make the parsing and exploration of your website easier

    Don’t use a complex dynamic web technology such as Flash or Ajax for your important content. We suggest that you use a simplified form of your important content for the crawler only (keep the complex one for real users).

    Don’t make your important content hard to discover. If some content can only be found by issuing a query, then it can’t be indexed by our crawler. We suggest you make a list page that includes all URLs of important content.

    We also suggest you put papers into PDF or Word format, rather than HTML.

    Contact us

    Please contact us directly if your website is not indexed. We will be happy to analyze the problem.




所有回覆

  • 2010年10月30日 22:42
     
     

    Hi,

    Does the Microsoft Academic Search crawler use any bibliographic metadata (that is generally embedded in HTML META tags)?

    Best regards,

    --Martin

  • 2010年11月6日 10:14
    版主
     
     

    Our crawler doesn't read the special HTML META tags in current version.

    However, we support OAI interface (Dublin Core format). If you have lots of papers' meta data and provide them through OAI interface, please tell us the OAI interface address.

  • 2010年12月1日 2:54
    版主
     
     

    The following table shows the sites which we crawled paper from till now. We only list the top 100 sites.

     

    Site PaperNumber
    redalyc.uaemex.mx 35546
    hal.archives-ouvertes.fr 31267
    cancerres.aacrjournals.org 28334
    bloodjournal.hematologylibrary.org 27094
    www.aaai.org 19482
    jvi.asm.org 19130
    www.aclweb.org 17260
    nar.oxfordjournals.org 17096
    www.math.ethz.ch 16577
    www.emis.de 16417
    jb.asm.org 16132
    www.cs.cmu.edu 15722
    aem.asm.org 15703
    emis.maths.adelaide.edu.au 15653
    research.microsoft.com 15438
    www.maths.soton.ac.uk 15042
    emis.luc.ac.be 14839
    www.scielo.br 14780
    reference.kfupm.edu.sa 14631
    jas.fass.org 14572
    www.plantphysiol.org 14080
    jds.fass.org 13557
    www.maths.tcd.ie 13515
    iai.asm.org 13393
    jcm.asm.org 13378
    ageconsearch.umn.edu 12829
    www.jneurosci.org 12482
    www.jimmunol.org 12387
    mcb.asm.org 12347
    www.genetics.org 12220
    www.wseas.us 11588
    jcb.rupress.org 11456
    www.math.helsinki.fi 11446
    jn.nutrition.org 11401
    epaper.kek.jp 11302
    admin.xosn.com 11173
    www.ams.org 10626
    jem.rupress.org 10624
    circ.ahajournals.org 10612
    aac.asm.org 10171
    vir.sgmjournals.org 10089
    www.biomedcentral.com 9886
    www.clinchem.org 9885
    www.fs.fed.us 9701
    mic.sgmjournals.org 9592
    www.ajronline.org 9434
    accelconf.web.cern.ch 9277
    www.mat.ub.es 9236
    emis.maths.tcd.ie 8983
    jcem.endojournals.org 8763
    www.stanford.edu 8762
    emis.library.cornell.edu 8666
    www.iovs.org 8034
    acl.ldc.upenn.edu 8007
    emis.math.ca 7958
    intl.plantphysiol.org 7857
    www.emis.math.ca 7811
    www.ias.ac.in 7786
    www.nber.org 7640
    www.anesthesia-analgesia.org 7622
    www.univie.ac.at 7447
    web.mit.edu 7390
    www.clevelandfed.org 7308
    www.wjgnet.com 7190
    mathnet.preprints.org 7094
    endo.endojournals.org 7036
    www.akademik.unsri.ac.id 6910
    ams.confex.com 6805
    content.onlinejacc.org 6799
    humrep.oxfordjournals.org 6759
    ats.ctsnetjournals.org 6757
    stroke.ahajournals.org 6746
    www.emis.ams.org 6608
    www.jlr.org 6563
    content.nejm.org 6534
    www.molbiolcell.org 6515
    www.ajcn.org 6449
    ndt.oxfordjournals.org 6435
    emis.math.tifr.res.in 6349
    wing.comp.nus.edu.sg 6259
    www.academicjournals.org 6146
    reports-archive.adm.cs.cmu.edu 6146
    www.jleukbio.org 6048
    jeb.biologists.org 6001
    iahs.info 5979
    www.usenix.org 5961
    www.ars.usda.gov 5856
    bioinformatics.oxfordjournals.org 5827
    www.cc.gatech.edu 5724
    www.princeton.edu 5587
    infoscience.epfl.ch 5565
    circres.ahajournals.org 5556
    hyper.ahajournals.org 5541
    jnm.snmjournals.org 5470
    pediatrics.aappublications.org 5441
    halshs.archives-ouvertes.fr 5359
    jac.oxfordjournals.org 5203
    www.ajnr.org 5180
    ajrccm.atsjournals.org 5048
    www.slac.stanford.edu 5048

  • 2010年12月19日 10:01
     
     
    Surprisingly, arxiv.org is not crawled (it hosts about 650k papers).
  • 2010年12月20日 3:17
     
     

    Hi Vincent,

    Thanks for your feedback! Here we just listed top 100 sites where we crawled paper from. Actually arxiv.org is also crawled.

    For example, you can view this page: http://academic.research.microsoft.com/Paper/120150.aspx 

    "arxiv.org" is among the view and download links.

    Best wishes


    Microsoft Academic Search Team
  • 2010年12月26日 23:06
     
     

    How can I give the url? Can you add this url bellow in your crawler list?

    http://virtualbib.fgv.br/oai/request

    This is the url of oai-pmh interface of Fundação Getulio Vargas' digital library:

    http://academic.research.microsoft.com/Organization/7003

    Cheers,

    Alexandre Rademaker

     

  • 2010年12月28日 3:15
     
     

    Hi arademaker,

    Thanks for your information! We've got your requirement and noticed our team member to make certain changes. Please check back later, due to our process, it may not be seen online very soon. We appreciate your patience and continuous support!

    Best wishes


    Microsoft Academic Search Team
  • 2011年1月3日 0:15
     
     

    Many thanks Caroline! I am a research at Getulio Vargas Foundation and also the project manager of our Digital Library. Let me know if we can make anything to help your crawler... 

    Happy new year!

    Cheers,

    Alexandre

  • 2011年1月3日 6:11
     
     

    Happy New Year to you and your team, too.


    Microsoft Academic Search Team
  • 2011年1月3日 10:36
    版主
     
     

    Hi arademaker,

          Could you provide detail information of your OAI service? Such format, user account? You can directly send mail to qingyu@microsoft.com

  • 2013年4月3日 11:17
     
     
    Respected Sir/Madam,
                          Please help us in including our journal in Microsoft Academic Database. Some of our manuscripts are already added by the authors, but the journal name (International Journal of Research in Computer Science) is not available in the index. Please help us in this regard.
    Following is the link to our journal's OAI data.
    http://ijorcs.org/oai/oai2.php?verb=ListRecords&metadataPrefix=oai_dc
  • 2013年5月21日 17:01
     
     
    Good day, Qing Yu.

    How can we add our Russian Open Access scientific library to the Microsoft Academic Search? I've wrote the letter to acadfb@microsoft.com at april 14, and unfortunatly recieved no answer.

    Our library is called CyberLeninka: http://cyberleninka.ru/

    Right now we consist of nearly 80000 russian scientific articles from more that 160 scientific journals and growing.

    We have an OAI-PMH interface http://www.openarchives.org/Register/BrowseSites?viewRecord=http://cyberleninka.ru/oai

    Our MARC Organisation Code is RU-MoCYL (normalized: rumocyl).
    If you have someone who speaks russian you can learn more about us in this video http://www.youtube.com/watch?v=b0ZDh28X4yI 

    About me: http://keldysh.ru/persons/english/semyachkin.html