For some reason, msnbot/2.0b is visiting the wrong IP addresses to retrieve robots.txt. In other words, it THINKS it is getting robots.txt for www.yoursite.com, but it is really reading the robots.txt file that is served for the default host at the IP address for www.mysite.com (not necessarily www.mysite.com's robots.txt). Clearly, msnbot/2.0b is using the wrong DNS lookup for its requests.
What does this mean? This means that msnbot/2.0b may not obey your robots.txt because it's reading MINE instead .
Oh, my! Is this serious? Yes. Your search results can now be influenced by another site, either accidentally or intentionally.
Do you want msnbot to index your site? It will not index your site if MY robots.txt disallows it.
Have you disallowed indexing? It will still index your site if MY robots.txt allows it .
Have you asked msnbot to ignore specific directories? They will be indexed if msnbot is using MY robots.txt .
I already block msnbot because it has behaved badly in the past, so why should I care? You should care because msnbot/2.0b is now beginning to index site content, moving beyond simply retrieving robots.txt. This means that malicious sites can be set up to return intentionally misleading, obscene or spammy content for ANY msnbot request . This could cause many undesirable consequences, including Live Search results that make it appear that your site hosts such content when it actually doesn't. It may also increase the number of page not found errors on your site, since the links may not exist there. If you rely solely on robots.txt to exclude msnbot, this may no longer work, because it is using someone else's robots.txt file instead .
How do I confirm this? Search your web log for requests from msnbot/2.0b. Do you see requests for links that don't exist on your site? That's because they exist on a different site, the one msnbot/2.0b THINKS it's crawling . If you log the requested server name, do you see unfamilar hosts? Those are the ones msnbot/2.0b THINKS it's visiting .
This beta bot is annoying and potentially damaging in a way that is outside of the webmaster's control. Microsoft, please shut it down immediately and do not deploy it again until this issue is fixed. And next time, pay attention to what it's actually doing. A few simple tests or packet inspections would have exposed this flaw immediately. OUR websites are not YOUR testbeds.
Saturday, April 11, 2009 4:17 PM