Resources for IT Professionals > Forums Home > Webmaster Forums > Crawling/Indexing Feedback and Discussion > Please help, msnbot is practically attacking my site - pages being recached every few seconds
Ask a questionAsk a question
 

AnswerPlease help, msnbot is practically attacking my site - pages being recached every few seconds

  • Tuesday, February 10, 2009 11:44 AMfriskers Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    For some reason, msnbot has a fascination with my /video/ directory, and has been the cause of investigation for our site using so much bandwidth lately. Turns out, since Feb 1st, msnbot has requested the /videos/ page over 240,000 times, using over 25GB of bandwidth. I have changed robots.txt to disallow it to index the folder, but it still persists. Here is a clipping of our raw access log, for just the last 10 minutes.

    http://www.rpgamers.net/accesslog.txt

    If anyone can advise why it has such great interest in this page, and now to allow it to index it without continously flooding it, I would appreciate your recommendations.

    Thank you.

Answers

  • Wednesday, February 11, 2009 8:34 PMBrett Yount Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    Hi,

    My apologies for the issues you are experiencing. I'll get our team to look at this and post back as soon as I get a response. Also, have you tried setting a crawl delay? That might help.


    Brett
    Program Manager, Live Search Webmaster Tools
    • Marked As Answer byBrett Yount Wednesday, February 25, 2009 4:57 PM
    •  

All Replies

  • Tuesday, February 10, 2009 5:30 PMfriskers Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Okay, the bot has requested robots.txt various times since I first posted, and it is clearly refusing to listen to it.

    I have now added the offending ip to deny on the htaccess level.

    A response on what the heck is going on would be good, or a link to where to report this abusive behaviour.

    The bot has actually increased in hostility overnight, demanding pages more often now.

    65.55.25.142 is the offending IP.

  • Wednesday, February 11, 2009 8:34 PMBrett Yount Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    Hi,

    My apologies for the issues you are experiencing. I'll get our team to look at this and post back as soon as I get a response. Also, have you tried setting a crawl delay? That might help.


    Brett
    Program Manager, Live Search Webmaster Tools
    • Marked As Answer byBrett Yount Wednesday, February 25, 2009 4:57 PM
    •  
  • Thursday, May 28, 2009 12:45 PMbrightergraphics Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi Brett,

    I searched Crawl Delay on this forum and although I've looked at several threads they all seem to end with you recomending setting the crawl delay to a number of seconds. Well over the last few years I've tried setting the crawl delay from 5 to 250000 and the MSN bot still hits my site every minute or so.

    I wouldn't mind but it's sucking the guts out of my bandwidth with very few hits from live.com or our many ads placed with Adcenter.

    Does your bot actually read the robots.txt file? or should I rename it to something else your bot will recognise?

    Is it true that if I disallow the msnbot it will still spider my site?

    Yours, looking for answers.....

    Chris