On www.geograph.org.uk we are noticing that msn bot doesnt seem to honour robots.txt .
extract from robots.txt, of relevent bit
User-agent: *
Disallow: /mapbrowse.php
Disallow: /map/
Disallow: /mapper/
Disallow: /maplarge.php
Disallow: /mapprint.php
....
Disallow: /editimage.php
Disallow: /browse.php
...
etc
As far as can see the file is fine, and these commands have been there for a while.
Some example lines from the access log that shouldnt be crawled...
65.55.106.191 - 0 [16/May/2009:07:32:11 +0100] "GET /maplarge.php?t=tolJ5ojXXJ0ojXJFojXXJfoMObJqoVOXJL5405oZZbXNbt8tOXhwZu4 HTTP/1.1" 503 5307 "-" "msnbot/2.0b (+http://search.m
sn.com/msnbot.htm)"
65.55.106.191 - 0 [16/May/2009:07:33:30 +0100] "GET /maplarge.php?t=tolJ5ojXXJ0ojXJFojXXJfoMObJqoVOXJL5405oZZbXNbt8tOXhwZu4 HTTP/1.0" 200 37166 "-" "msnbot/2.0b (+http://search.
msn.com/msnbot.htm)"
65.55.106.236 - 0 [16/May/2009:07:58:03 +0100] "GET /browse.php?p=290852 HTTP/1.1" 200 6164 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.164 - 0 [16/May/2009:08:18:02 +0100] "GET /editimage.php?id=1082059 HTTP/1.1" 200 6063 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.147 - 0 [16/May/2009:08:18:32 +0100] "GET /browse.php?p=382785 HTTP/1.1" 200 6905 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.188 - 0 [16/May/2009:08:18:35 +0100] "GET /mapbrowse.php?t=tolJ5oOXXJ0oOXJFoOXXJfobXNJqoVjMJL5405oVMbNwlMbujNNZZuw&gridref_from=SJ9892 HTTP/1.1" 200 7495 "-" "msnbot/2
.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.207 - 0 [16/May/2009:08:32:07 +0100] "GET /browse.php?p=286331 HTTP/1.1" 200 6069 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.165 - 0 [16/May/2009:08:43:17 +0100] "GET /browse.php?p=283628 HTTP/1.1" 200 6036 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.218 - 0 [16/May/2009:09:05:01 +0100] "GET /reuse.php?id=857043 HTTP/1.1" 200 19990 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.115 - 0 [16/May/2009:13:01:31 +0100] "GET /browse.php?p=644008 HTTP/1.1" 200 5978 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.107 - 0 [16/May/2009:13:28:03 +0100] "GET /browse.php?p=404440 HTTP/1.1" 200 6038 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.241 - 0 [16/May/2009:14:06:33 +0100] "GET /ecard.php?image=643562 HTTP/1.1" 200 6061 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.140 - 0 [16/May/2009:14:28:30 +0100] "GET /browse.php?p=426977 HTTP/1.1" 200 5946 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.114 - 0 [16/May/2009:14:42:22 +0100] "GET /browse.php?p=512392 HTTP/1.1" 200 6134 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.230 - 0 [16/May/2009:15:18:32 +0100] "GET /mapbrowse.php?t=tolJ5oOXXJ0oOXJFoOXXJfobObJqoVjXJL5405o4lVXZwNtNwXjXaMu&gridref_from=SK4384 HTTP/1.1" 200 7609 "-" "msnbot/2
.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.241 - 0 [16/May/2009:15:23:02 +0100] "GET /browse.php?p=154252 HTTP/1.1" 200 6780 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
(according to reverse DNS they look valid msn IPs)
We also get lots of hits from "msnbot/1.1" - which as far as can see obay robots.txt, eg this this is an example of a valid request
65.55.209.228 - 0 [16/May/2009:16:13:26 +0100] "GET /profile/3860?expand=1 HTTP/1.1" 200 39533 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"