FAQ: How to make your site indexed by Microsoft Academic Search
-
2010年9月27日 9:35擁有者
Microsoft Academic Search uses a focus crawler to fetch data from the Internet. The following are some tips for website administrators that can help websites be indexed by Academic Search easily and quickly.
Crawler name and IP
The Microsoft Academic Search crawler is called “librabot”. The user agent string in the http request is “librabot/2.0 (+http://academic.research.microsoft.com/)”. Our crawler’s IP ranges are 219.142.53.0/25 ,202.96.51.128/25 and 131.107.65.248 . The http request from our crawler looks like this:
GET http://www.microsoft.com/ HTTP/1.0
Host: www.microsoft.com
User-Agent: librabot/2.0 (+http://academic.research.microsoft.com/)
Accept: text/html, text/plain, text/xml, application/*
Accept-Encoding: identity;q=1.0
From: librabot@microsoft.com
Please make sure that your website can be accessed by the IP and crawler name.
Academic search follows the robots.txt protocol. If your website has robots.txt, please make sure the crawler is not blocked by the file; set the crawldelay to a small value if you want to control the crawler access frequency.
Sitemap protocol
Our crawler supports sitemap protocol. If you want your important content to be indexed in quantity and quickly, you can write the URL of the important content into the sitemap; please be sure to update it when the content in your website changes.
Make the parsing and exploration of your website easier
Don’t use a complex dynamic web technology such as Flash or Ajax for your important content. We suggest that you use a simplified form of your important content for the crawler only (keep the complex one for real users).
Don’t make your important content hard to discover. If some content can only be found by issuing a query, then it can’t be indexed by our crawler. We suggest you make a list page that includes all URLs of important content.
We also suggest you put papers into PDF or Word format, rather than HTML.
Contact us
Please contact us directly if your website is not indexed. We will be happy to analyze the problem.
- 已編輯 Qing YuMicrosoft Employee, Moderator 2010年11月17日 10:12 change crawler ip
- 已變更類型 Cherry CHEMicrosoft Employee, Owner 2011年2月22日 6:00
- 已編輯 Cherry CHEMicrosoft Employee, Owner 2011年12月12日 6:07
- 已編輯 Thomas, Academic Search EditorModerator 2013年2月11日 23:55 Made minor edits.
- 已編輯 Thomas, Academic Search EditorModerator 2013年2月11日 23:56 Made minor edits.
所有回覆
-
2010年10月30日 22:42
Hi,
Does the Microsoft Academic Search crawler use any bibliographic metadata (that is generally embedded in HTML META tags)?
- Dublin Core ?
- Prism ?
- Eprints (eprints.*) ?
- HighWire Press (citation_*)?
- CoinS ?
Best regards,
--Martin
-
2010年11月6日 10:14版主
Our crawler doesn't read the special HTML META tags in current version.
However, we support OAI interface (Dublin Core format). If you have lots of papers' meta data and provide them through OAI interface, please tell us the OAI interface address.
-
2010年12月1日 2:54版主
The following table shows the sites which we crawled paper from till now. We only list the top 100 sites.
Site PaperNumber redalyc.uaemex.mx 35546 hal.archives-ouvertes.fr 31267 cancerres.aacrjournals.org 28334 bloodjournal.hematologylibrary.org 27094 www.aaai.org 19482 jvi.asm.org 19130 www.aclweb.org 17260 nar.oxfordjournals.org 17096 www.math.ethz.ch 16577 www.emis.de 16417 jb.asm.org 16132 www.cs.cmu.edu 15722 aem.asm.org 15703 emis.maths.adelaide.edu.au 15653 research.microsoft.com 15438 www.maths.soton.ac.uk 15042 emis.luc.ac.be 14839 www.scielo.br 14780 reference.kfupm.edu.sa 14631 jas.fass.org 14572 www.plantphysiol.org 14080 jds.fass.org 13557 www.maths.tcd.ie 13515 iai.asm.org 13393 jcm.asm.org 13378 ageconsearch.umn.edu 12829 www.jneurosci.org 12482 www.jimmunol.org 12387 mcb.asm.org 12347 www.genetics.org 12220 www.wseas.us 11588 jcb.rupress.org 11456 www.math.helsinki.fi 11446 jn.nutrition.org 11401 epaper.kek.jp 11302 admin.xosn.com 11173 www.ams.org 10626 jem.rupress.org 10624 circ.ahajournals.org 10612 aac.asm.org 10171 vir.sgmjournals.org 10089 www.biomedcentral.com 9886 www.clinchem.org 9885 www.fs.fed.us 9701 mic.sgmjournals.org 9592 www.ajronline.org 9434 accelconf.web.cern.ch 9277 www.mat.ub.es 9236 emis.maths.tcd.ie 8983 jcem.endojournals.org 8763 www.stanford.edu 8762 emis.library.cornell.edu 8666 www.iovs.org 8034 acl.ldc.upenn.edu 8007 emis.math.ca 7958 intl.plantphysiol.org 7857 www.emis.math.ca 7811 www.ias.ac.in 7786 www.nber.org 7640 www.anesthesia-analgesia.org 7622 www.univie.ac.at 7447 web.mit.edu 7390 www.clevelandfed.org 7308 www.wjgnet.com 7190 mathnet.preprints.org 7094 endo.endojournals.org 7036 www.akademik.unsri.ac.id 6910 ams.confex.com 6805 content.onlinejacc.org 6799 humrep.oxfordjournals.org 6759 ats.ctsnetjournals.org 6757 stroke.ahajournals.org 6746 www.emis.ams.org 6608 www.jlr.org 6563 content.nejm.org 6534 www.molbiolcell.org 6515 www.ajcn.org 6449 ndt.oxfordjournals.org 6435 emis.math.tifr.res.in 6349 wing.comp.nus.edu.sg 6259 www.academicjournals.org 6146 reports-archive.adm.cs.cmu.edu 6146 www.jleukbio.org 6048 jeb.biologists.org 6001 iahs.info 5979 www.usenix.org 5961 www.ars.usda.gov 5856 bioinformatics.oxfordjournals.org 5827 www.cc.gatech.edu 5724 www.princeton.edu 5587 infoscience.epfl.ch 5565 circres.ahajournals.org 5556 hyper.ahajournals.org 5541 jnm.snmjournals.org 5470 pediatrics.aappublications.org 5441 halshs.archives-ouvertes.fr 5359 jac.oxfordjournals.org 5203 www.ajnr.org 5180 ajrccm.atsjournals.org 5048 www.slac.stanford.edu 5048 - 已標示為解答 Qing YuMicrosoft Employee, Moderator 2010年12月1日 2:56
- 已取消標示為解答 Qing YuMicrosoft Employee, Moderator 2010年12月1日 2:56
-
2010年12月19日 10:01Surprisingly, arxiv.org is not crawled (it hosts about 650k papers).
-
2010年12月20日 3:17
Hi Vincent,
Thanks for your feedback! Here we just listed top 100 sites where we crawled paper from. Actually arxiv.org is also crawled.
For example, you can view this page: http://academic.research.microsoft.com/Paper/120150.aspx
"arxiv.org" is among the view and download links.
Best wishes
Microsoft Academic Search Team -
2010年12月26日 23:06
How can I give the url? Can you add this url bellow in your crawler list?
http://virtualbib.fgv.br/oai/request
This is the url of oai-pmh interface of Fundação Getulio Vargas' digital library:
http://academic.research.microsoft.com/Organization/7003
Cheers,
Alexandre Rademaker
-
2010年12月28日 3:15
Hi arademaker,
Thanks for your information! We've got your requirement and noticed our team member to make certain changes. Please check back later, due to our process, it may not be seen online very soon. We appreciate your patience and continuous support!
Best wishes
Microsoft Academic Search Team -
2011年1月3日 0:15
Many thanks Caroline! I am a research at Getulio Vargas Foundation and also the project manager of our Digital Library. Let me know if we can make anything to help your crawler...
Happy new year!
Cheers,
Alexandre
-
2011年1月3日 6:11
Happy New Year to you and your team, too.
Microsoft Academic Search Team -
2011年1月3日 10:36版主
Hi arademaker,
Could you provide detail information of your OAI service? Such format, user account? You can directly send mail to qingyu@microsoft.com
-
2013年4月3日 11:17Respected Sir/Madam,
Please help us in including our journal in Microsoft Academic Database. Some of our manuscripts are already added by the authors, but the journal name (International Journal of Research in Computer Science) is not available in the index. Please help us in this regard.
Following is the link to our journal's OAI data.
http://ijorcs.org/oai/oai2.php?verb=ListRecords&metadataPrefix=oai_dc- 已編輯 White Globe 2013年4月3日 11:18
-
2013年5月21日 17:01Good day, Qing Yu.
How can we add our Russian Open Access scientific library to the Microsoft Academic Search? I've wrote the letter to acadfb@microsoft.com at april 14, and unfortunatly recieved no answer.
Our library is called CyberLeninka: http://cyberleninka.ru/
Right now we consist of nearly 80000 russian scientific articles from more that 160 scientific journals and growing.
We have an OAI-PMH interface http://www.openarchives.org/Register/BrowseSites?viewRecord=http://cyberleninka.ru/oai
Our MARC Organisation Code is RU-MoCYL (normalized: rumocyl).
If you have someone who speaks russian you can learn more about us in this video http://www.youtube.com/watch?v=b0ZDh28X4yI
About me: http://keldysh.ru/persons/english/semyachkin.html