MSN/Live indexing content marked as "noindex" RRS feed


All replies




    Could you please list the URLs that should not have been indexed?






    Wednesday, December 17, 2008 7:16 PM
  • Sure, for example this one or maybe this one. You will find more if you look at the pages. :-)
    Wednesday, December 17, 2008 7:36 PM
  • I am sorry, but I am unable to look at every page (400+) to verify if it should or shouldn't be indexed. I tried looking at your robots.txt file for agentdb, but it was blank.


     You can also use our support form located at: https://support.live.com/eform.aspx?productKey=wlsearchcontentremoval&ct=eformts to request permanent URL/ directory removal.


    Wednesday, December 17, 2008 8:09 PM
  • The point that I tried to make was that there is a <meta name="robots" in each of these pages. This meta attribute should make your crawler NOT index these pages.

    The fact that robots.txt is blank only means that you can crawl all of the site, but of course you should still observe individual pages' <meta name="robots" attributes.

    Thursday, December 18, 2008 8:12 PM
  • So all of your pages are noindex, nofollow? If that's the case, I can remove your entire site from our index. Otherwise,  I don't have the resourses to verify which URLs should stay and which should not; which is why I provided the link to our URL removal request form. If you would like me to investigate pages on an individual basis, please list them. Also, what is your reasoning for not adding these to your robots.txt file? Why have a file at all if you intend to leave it blank?





    Thursday, December 18, 2008 8:39 PM
  • Brett,

    there are two ways how you can exclude pages from search engine indices.

    1. robots.txt. You put a URI in there and the search engines will not crawl it.

    2. <meta name="robots"> as described here. This lets you control if search engines should index your content and follow any links from that page.

    The point I'm trying to make is that Live does not handle <meta name="robots"> correctly. I am sure it does handle robots.txt correctly, but not <meta name="robots"> since many pages that are marked as <meta name="robots" content="noindex"/> are included in Live's index.
    Friday, December 19, 2008 10:56 AM
  • Brett,

    is there any update on this?



    Saturday, December 27, 2008 8:52 PM