Disallowed pages being indexed and sitemap being ignored RRS feed

  • Question

  • My website's been up since toward the end of last year: www.eraofdata.com

    It gets regularly crawled and indexed by some other search engines I don't need to mention.

    I've been looking at why @ 97% of my search engine referrals are from one particular engine and 0% come from live search for a couple of months, but haven't been able to figure it out, hence this post (the last significant update was in March although I've made some tweaks about a month ago).

    According to webmaster tools, my root index page is indexed and one other 'page': a gzipped sitemap file.

    1. I've really got no idea why such a file is getting crawled, so I excluded it in robots.txt a few weeks back, yet it's still getting crawled, but only by live search.

    2. I can't establish why live search is only indexing the root page and not any other content. This content is explicitly referred to in the sitemap file and the other content referenced in the sitemap is getting picked up by other search engines.

    For reasons related to the content of that site it would be appropriate if live search could crawl all the pages listed in the sitemap from time to time.
    ajmer dhariwal || eraofdata.com
    Thursday, May 14, 2009 8:20 AM

All replies

  • I've renamed the /blog/sitemap.xml.gz file that was explicitly disallowed in the robots.txt in the hope that this allows the correct sitemap file in the root folder to be crawled, which will hopefully allow the content of the site to be fully crawled. I can only assume the name of the file is somehow confusing the crawler into thinking it's a 'proper' sitemap file which it's subsequently failing to read and therefore not indexing anything.
    ajmer dhariwal || eraofdata.com
    Tuesday, May 19, 2009 6:03 AM