Ask a questionAsk a question
 

Answermsnbot is using MY robots.txt to crawl YOUR site

  • Saturday, April 11, 2009 4:17 PMbmpub Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    For some reason, msnbot/2.0b is visiting the wrong IP addresses to retrieve robots.txt. In other words, it THINKS it is getting robots.txt for www.yoursite.com, but it is really reading the robots.txt file that is served for the default host at the IP address for www.mysite.com (not necessarily www.mysite.com's robots.txt). Clearly, msnbot/2.0b is using the wrong DNS lookup for its requests.

    What does this mean? This means that msnbot/2.0b may not obey your robots.txt because it's reading MINE instead .

    Oh, my! Is this serious? Yes. Your search results can now be influenced by another site, either accidentally or intentionally.

    Do you want msnbot to index your site? It will not index your site if MY robots.txt disallows it.

    Have you disallowed indexing? It will still index your site if MY robots.txt allows it .

    Have you asked msnbot to ignore specific directories? They will be indexed if msnbot is using MY robots.txt .

    I already block msnbot because it has behaved badly in the past, so why should I care? You should care because msnbot/2.0b is now beginning to index site content, moving beyond simply retrieving robots.txt. This means that malicious sites can be set up to return intentionally misleading, obscene or spammy content for ANY msnbot request . This could cause many undesirable consequences, including Live Search results that make it appear that your site hosts such content when it actually doesn't. It may also increase the number of page not found errors on your site, since the links may not exist there. If you rely solely on robots.txt to exclude msnbot, this may no longer work, because it is using someone else's robots.txt file instead .

    How do I confirm this? Search your web log for requests from msnbot/2.0b. Do you see requests for links that don't exist on your site? That's because they exist on a different site, the one msnbot/2.0b THINKS it's crawling . If you log the requested server name, do you see unfamilar hosts? Those are the ones msnbot/2.0b THINKS it's visiting .

    This beta bot is annoying and potentially damaging in a way that is outside of the webmaster's control. Microsoft, please shut it down immediately and do not deploy it again until this issue is fixed. And next time, pay attention to what it's actually doing. A few simple tests or packet inspections would have exposed this flaw immediately. OUR websites are not YOUR testbeds.

Answers

  • Wednesday, April 15, 2009 5:21 PMBrett Yount Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    Thank you for catching this bug. This issue should be fixed shortly. Thank you for your patience during our msnbot 2.0 beta.

    Brett
    Program Manager, Live Search Webmaster Tools
    • Marked As Answer byBrett Yount Wednesday, April 15, 2009 5:21 PM
    •  

All Replies

  • Monday, April 13, 2009 3:14 PMBrett Yount Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi,

    We have a team looking at this. I will post back once we have an update.

    Thanks,

    Brett
    Program Manager, Live Search Webmaster Tools
  • Wednesday, April 15, 2009 5:21 PMBrett Yount Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    Thank you for catching this bug. This issue should be fixed shortly. Thank you for your patience during our msnbot 2.0 beta.

    Brett
    Program Manager, Live Search Webmaster Tools
    • Marked As Answer byBrett Yount Wednesday, April 15, 2009 5:21 PM
    •