locked
MSN Search Bot trying to attack my site! RRS feed

  • Question

  • I saw this in my log:

    65.55.210.11 - - [13/Jun/2008:00:22:34 +0100] "GET /robots.txt HTTP/1.1" 404 275
    65.55.210.11 - - [13/Jun/2008:00:22:34 +0100] "GET /admin/ HTTP/1.1" 401 468

    Oh, it's just someone seeing if my website is insecure, it's not insecure - but have fun looking.

    Wait a second. I'll just check that IP... it looks a bit weird.



    65.55.210.11 resolves to msnbot-65-55-210-11.search.msn.com

    OrgName:    Microsoft Corp 
    OrgID:      MSFT
    Address:    One Microsoft Way
    City:       Redmond
    StateProv:  WA
    PostalCode: 98052
    Country:    US


    WHAT DOES MSN SEARCH THINK IT IS DOING!? I don't have an /admin/ folder, and NEVER have. I don't even have a robots.txt

    This is potentially illegal under UK law, MSN Admins won't reply to me, and the feedback forms are ignored.

    Do I have to take MSN to court over this or something?

    Dug Stokes

    Thursday, June 12, 2008 11:42 PM

Answers

  • Hi,

     

    I am waiting on a response from my management. They usually get back to me within the day, so I had no reason to expect different. Please feel free to fill out our Live Search Site Owner support form in the mean time.

     

    Brett

     

    Monday, June 16, 2008 7:02 PM

All replies

  • Hi,

     

    I am not sure who you tried to contact before. However, I will look into this and get back to you today.

     

    Brett

     

    Friday, June 13, 2008 2:50 PM
  • Hi Brett,

    Now, I'm not sure if you'd have noticed this but I had a long look into it and discovered that you lied to me.

    You told me you'd look into it and get back to me "today"...

    You didn't even get back to me "tomorrow".

    You didn't even get back to me "the day after tomorrow"

    Brett... Why did you lie to me?

    Was that a Microsoft Day?

    I know how long copying a file can take, when it says a day i imagine it means a lifetime.

    Does that mean you'll never reply?

    Brett? Brett?

    You customer support really sucks, Brett.

    I guess I'll just have to put up with Microsoft constantly trying to hack my website, delete all my files and ridicule my children.

    If I had children, which I don't.

    Mainly because I don't want them to experience your crappy customer services.

    Now, will SOMEONE please get back to me on this subject before I start being REALLY sarcastic.

    There is NO EXCUSE for a search engine trying to spider an /ADMIN/ page on ANY website, especially if it's never existed there in the first place, and even if it does, perhaps it's not designed for your little search engine children daemons.

    (No, I'm not calling your children demons... It's a technical term. Heh, I since guess you work for Microsoft you might not recognise technical terms such as that, but just for the record, I wasn't insulting your children.)

    I don't have any other search engine doing this. Why yours?

    I'd even go as far to suggest Slashdot and other news sites may be interested in this one! Hey! Why don't you put it on MSN News? It might be interesting for once.

    Y'know, I don't expect a reply, but it would be nice to know why your web search tries to hack my site once a week. You didn't have to say you'll get back to me, but when you say you will the same day, usually that means keeping that promise.

    If you can't keep it, don't make it.

    Cheers,

    Dug Stokes


    Monday, June 16, 2008 1:05 PM
  • Hi,

     

    I am waiting on a response from my management. They usually get back to me within the day, so I had no reason to expect different. Please feel free to fill out our Live Search Site Owner support form in the mean time.

     

    Brett

     

    Monday, June 16, 2008 7:02 PM
  • Dug,

    You're asking for assistance and to receive a response back on the same day from M$?

    Helloooooo, what are you thinking? 

    You should know by now they just design the software and implement it. They have no idea how to fix it or tell you what's wrong with it. Well they know what's wrong with it they just deny anything is wrong with it.
    Monday, June 16, 2008 8:18 PM
  •  

    I have placed the html codes in and have done everything that is asked but my site is still not listed... It is however listed in the other majior search engines... Why is it not listed in MSN???  My url

    http://www.dictionaryfordads,com

    Any help you can provide for me would be appreciated

    Tuesday, June 17, 2008 12:34 AM
  • Thanks for finally getting back to me Brett, don't worry that you promised a same-day response but didn't get back to me for a few days.

    Just one question, about this 'bug' your search spider has that may be infringing on UK Law, by accessing web directories that are secured and not-accessible, but your spider insists on accessing -

    do you think I'd be better off going to the National, or International Press about this?

    Thanks man, they're gunna have a field day with this story.

    Dug Stokes
    Tuesday, June 17, 2008 6:25 AM
  • Dug,

     

    Can you please provide me with your domain name? We are continuing to research this, but I need more information than what you have provided.

     

     

    Thanks,

     

    Brett

    Tuesday, June 17, 2008 4:41 PM
  • My Domain is http://www.dictionaryfordads.com
    Tuesday, June 17, 2008 8:46 PM
  • Brett,

    I've provided you with my logs, exact dates and times and my ip address.

    My servers IP: 64.22.124.200

    I can't tell you which domain it is - I don't have that logged, but from the search queries it'd be:
    monkeyboi.com

    Here, have some more logs attacking my site:

    (This is unedited from: egrep 65.55.210.11 apache2/*)

    apache2/access_log:65.55.210.11 - - [08/Jun/2008:20:33:39 +0100] "GET /admin/ HTTP/1.1" 404 468
    apache2/access_log:65.55.210.119 - - [09/Jun/2008:09:24:51 +0100] "GET /robots.txt HTTP/1.1" 200 50
    apache2/access_log:65.55.210.119 - - [09/Jun/2008:09:24:52 +0100] "GET / HTTP/1.1" 302 -
    apache2/access_log:65.55.210.119 - - [09/Jun/2008:09:27:43 +0100] "GET /blog/ HTTP/1.1" 200 4708
    apache2/access_log:65.55.210.118 - - [10/Jun/2008:13:31:36 +0100] "GET /robots.txt HTTP/1.1" 200 50
    apache2/access_log:65.55.210.118 - - [10/Jun/2008:13:31:36 +0100] "GET /blog/archives/34/feed HTTP/1.1" 200 445
    apache2/access_log:65.55.210.11 - - [10/Jun/2008:22:17:14 +0100] "GET /robots.txt HTTP/1.1" 404 275
    apache2/access_log:65.55.210.11 - - [10/Jun/2008:22:17:14 +0100] "GET /admin/ HTTP/1.1" 404 468
    apache2/access_log:65.55.210.119 - - [11/Jun/2008:12:09:03 +0100] "GET /robots.txt HTTP/1.1" 200 50
    apache2/access_log:65.55.210.119 - - [11/Jun/2008:12:09:04 +0100] "GET / HTTP/1.1" 302 -
    apache2/access_log:65.55.210.119 - - [11/Jun/2008:12:10:56 +0100] "GET /blog/ HTTP/1.1" 200 4708
    apache2/access_log:65.55.210.111 - - [12/Jun/2008:10:40:29 +0100] "GET /robots.txt HTTP/1.1" 200 50
    apache2/access_log:65.55.210.111 - - [12/Jun/2008:10:40:30 +0100] "GET /blog/archives/date/2008/02 HTTP/1.1" 200 1884
    apache2/access_log:65.55.210.11 - - [12/Jun/2008:16:20:28 +0100] "GET /robots.txt HTTP/1.1" 404 275
    apache2/access_log:65.55.210.11 - - [12/Jun/2008:16:20:28 +0100] "GET /links.htm HTTP/1.1" 200 5338
    apache2/access_log:65.55.210.112 - - [12/Jun/2008:18:41:36 +0100] "GET /robots.txt HTTP/1.1" 200 50
    apache2/access_log:65.55.210.112 - - [12/Jun/2008:18:41:36 +0100] "GET /blog/archives/33/trackback HTTP/1.1" 302 26
    apache2/access_log:65.55.210.112 - - [12/Jun/2008:18:44:59 +0100] "GET /blog/archives/33 HTTP/1.1" 200 3040
    apache2/access_log:65.55.210.11 - - [13/Jun/2008:00:22:34 +0100] "GET /robots.txt HTTP/1.1" 404 275
    apache2/access_log:65.55.210.11 - - [13/Jun/2008:00:22:34 +0100] "GET /admin/ HTTP/1.1" 404 468
    apache2/access_log:65.55.210.119 - - [13/Jun/2008:13:49:55 +0100] "GET /robots.txt HTTP/1.1" 200 50
    apache2/access_log:65.55.210.119 - - [13/Jun/2008:13:49:56 +0100] "GET / HTTP/1.1" 302 -
    apache2/access_log:65.55.210.119 - - [13/Jun/2008:13:51:12 +0100] "GET /blog/ HTTP/1.1" 200 4708
    apache2/access_log:65.55.210.11 - - [15/Jun/2008:02:15:28 +0100] "GET /robots.txt HTTP/1.1" 404 275
    apache2/access_log:65.55.210.11 - - [15/Jun/2008:02:15:28 +0100] "GET /admin/ HTTP/1.1" 404 468
    apache2/access_log:65.55.210.110 - - [15/Jun/2008:02:31:16 +0100] "GET /robots.txt HTTP/1.1" 200 50
    apache2/access_log:65.55.210.110 - - [15/Jun/2008:02:31:19 +0100] "GET /blog/archives/date/2007/10 HTTP/1.1" 200 1390
    apache2/access_log:65.55.210.119 - - [15/Jun/2008:19:41:31 +0100] "GET /robots.txt HTTP/1.1" 200 50


    Hmm, looks like you're doing this more than once a day, this is practically a DDOS and exploitation attack.

    I have given you ALL the information I have on this, you've told me you'll get back to me same day - are you just going to tell me it's not a problem? Or to ignore it? I suggest you do something, and soon.

    Get back to me or I'm contacting my local Press Office to see if they're interested to hear that Microsoft's Search engine checks the same pages, every day, looking for open admin sections of websites.

    Brett

    P.s. - KB (dictionaryfordads) - you WON'T get an answer by posting in an unrelated forum topic. It's rude and makes no sense. Start your own topic!
    Wednesday, June 18, 2008 11:49 AM
  • Another one this morning, Brett.

    access_log:65.55.210.11 - - [19/Jun/2008:08:55:54 +0100] "GET /admin/ HTTP/1.1" 404 468

    This is getting really stupid Brett.

    Either get back to me and fix it or, what do you suggest I can do to stop this?

    I'm getting a little annoyed you can't be bothered to reply to my messages.

    I'll give you till the end of day today to get back to me.


    I suggest you reply before then.
    Thursday, June 19, 2008 1:41 PM
  • You should read a little bit about robots.txt it is a file you should use to prevent indexing from search engines bots ...

     

    http://www.robotstxt.org/faq.html

     

     

    Saturday, September 6, 2008 10:45 PM
  • Frederic,

    You appear to be suggesting the following:

    1. That I list a potentially administrative folder in my robots.txt
    (which is suggested that it is a bad idea... as its an administrative folder)

    2. That I list a folder in my robots.txt for 'disallowed' FOR A DIRECTORY THAT DOES NOT EVEN EXIST.

    Perhaps you should read this whole forum again before making such simplistic suggestions.

    MSN Search's bot is just randomly querying the /admin/ folder, which has never existed, and admin folders should not be listed in robots.txt. As for listing a location that has never existed in the past - well, that just seems stupidity.

    Perhaps you'd like to think again? I notice MSN's tech guys are at a loss to bother to fix the problem.
    Sunday, September 7, 2008 9:28 AM
  • Copied from mail sent to (dug at frag .co . uk...) from Jeremiah Andrick dated June 26, 2008:

     

    Dug,

     

    I was made aware of your concern through Brett on our webmaster forums as well as from one of my support engineers.  Let me repeat your concern to ensure I didn’t miss anything, and then I will give you the details of my investigation.

     

    You saw the following type of requests coming from MSNbot:

     

    apache2/access_log:65.55.210.11 - - [18/Jun/2008:20:33:39 +0100] "GET
    /admin/ HTTP/1.1" 404 468
    apache2/access_log:65.55.210.118 - - [20/Jun/2008:13:31:36 +0100] "GET
    /blog/archives/34/feed HTTP/1.1" 200 445
    apache2/access_log:65.55.210.11 - - [20/Jun/2008:22:17:14 +0100] "GET
    /robots.txt HTTP/1.1" 404 275

    Your concern is that we were crawling for:

    1.       Files that don’t exist

    2.       Creating a Denial of Service Attack or other threat with the bot.

     

    Please let me know if I missed something.

                                                                                  

    We did some investigation in our logs and this is what we found:

    Your site is hosted at the following IP: (Removed for security reasons) when we look, we also see several other sites hosted at the same IP Address.  Including www.britelets.com.   I assume this is another site you either own or manage as the comment “site designed by frag.co.uk” appears at the bottom of each page. You also link to this site in your portfolio section of your site http://frag.co.uk/portfolio.html.

     

    When I looked through the logs you sent I checked britelets.com for the pages being requested highlighted above, I found this:

    http://www.britelets.com/blog/archives/34/

    http://www.britelets.com/admin/

     

    We actually found links to http://www.britelets/admin/ on 21 of the 27 pages we have from the www.britelets.com.

     

     

    ·         In response to your concern that we were crawling for pages that don’t exist, In this instance, it appears that is not what is occurring. We are looking for files on the host machine which do exist based on the investigation above, but are a part of a different site on the same IP address.   We don’t just look at the TLD when crawling but also leverage IP which is why you see the requests.

     

    ·         In looking for your admin page, we are not attempting to breach the security of your site, but instead following links which are clearly visible on the page. Please note the http://britelets.com/contact.htm has a link to your "administration" consule.  Your admin page produces a dialog box for sign in and does not respond with a page to the crawler thus the 404 returned in the request.

     

    ·         The calls for the robots.txt are a standard call looking to see if the file exists.  We do this so we can honor it if it exists.  We usually run this check a few times.   

     

    I am sorry if these crawls caused you any concern.  You can fix the problem a multiple of ways but I would recommend starting with directing bot traffic with a robots.txt file.   There is a great article on the robots exclusion protocol which we honor at http://janeandrobot.com/post/Managing-Robots-Access-To-Your-Website.aspx

     

    Please let me know if you have any other concerns about this issue or if I missed anything.  We strive to be careful when we crawl and want to improve the process for webmasters.

     

    Thanks

     

    Jeremiah Andrick

    Program Manager | Microsoft Corporation | Live Search Developer Tools

     

     

    End of copied message

     

    Friday, September 12, 2008 4:25 PM
  • Can I say... Dug got owned?

    What an obnoxious person... and on top of that, he's completely wrong. Too bad he didn't take his story to the press so more people could rofl.
    Tuesday, October 28, 2008 1:37 PM