locked
Index problem due to REP RRS feed

  • Question

  • I have my robots.txt and sitemap set up where it should be indexed correctly from what I'm seeing.  But I am new to site design and may have missed something.  The site url is www.k12reader dot com.  Here's the message I get when I check under crawl issues and BlockedByRep as the type for k12reader dot com:

    "Live Search has found this URL, however, it has been prevented from being indexed by the robots exclusion protocol (REP), either in the Robots.txt file, through a page meta tag, or an HTTP header attribute. We recommend you review this list periodically to ensure that they are not inadvertently blocking content you would like indexed."

    Any idea why MSN Live isn't able to index the url? 




    Friday, August 15, 2008 7:34 PM

Answers

  • That will work to cover all pages below /wp

     

    Monday, August 25, 2008 3:47 PM

All replies

  • Anyone?
    Tuesday, August 19, 2008 5:40 PM
  • Okay, first a disclaimer: I don't know the syntax for the robots.txt file, so I could be out to lunch on this reply.

     

    I checked the main pages and they all have "follow" in the meta tags, so that's probably not the problem.  I don't know what the message means by a "header attribute".

     

    I looked at the robots.txt file.  There is a line in it that bothers me, but this could simply because I don't know the syntax.  The line is in the second grouping of "instructions to the bots":

     

    User-agent: *

    Disallow: /wp-

     

    I supposed that the first line means that all bots should pay attention to the following instructions.  The second line looks wonky, in particular the ending hyphen.  My best guess would be that this should be

     

    Disallow: /wp/

     

    That would have it match the syntax of all the other lines.  If this is faulty syntax, it could be throwing the msnbot off, so I'd suggest changing it to

     

    Disallow: /wp/

     

    and see what happens.

     

    ... Duane

     

    Wednesday, August 20, 2008 12:34 PM
  • Thanks for taking a look.  The wp- refers to the beginning of a wordpress file like wp-admin.php.  I'll take a look though and see if there's another way to disallow those files.
    Thursday, August 21, 2008 7:22 PM
  • As I said I don't have a lot of experience with robots.txt syntax, but /wp- on its own doesn't look right.

     

    If you are trying to exclude all the Wordpress pages, something like this might work:

     

    Disallow: /wp-*.php$

     

    This is my source for syntax:

     

    http://www.searchtools.com/robots/robots-txt-elements.html#disallow

     

    ... Duane

    Thursday, August 21, 2008 7:47 PM
  • That will work to cover all pages below /wp

     

    Monday, August 25, 2008 3:47 PM