locked
Not respecting noindex, sitemaps RRS feed

  • Question

  • First, I think MSN Search folks should have a chat with the MSN Communities folks, because these forums stink.  Seriously, on IE7 I get at least 3 Javascript errors on every load.  On about 50% of attempts to view or post I get:

     

    We apologize, but an unknown error has occurred in the forums.

    This error has been logged.

     

    I'd hate to see those logs.  And then on about another 30% I get no editable text area to post to.  And I've posted this message about 15 times so far with no luck yet.

     

    Now, to my real problem.  I run povo.com.  We are a local search site.  We have somewhere around 300,000 "real" pages that should be in the index.  Google has about 350,000, because they correctly index some meta-list-type pages that we put in our sitemap.  MSN Search on the other hand... 160 results.  Never mind that the site has been up for almost a year, or that for just one of our subdomains (boston.povo.com), Webmaster Center reports 2,370 indexed pages.

     

    And here's the even more ridiculous part.  Of the first 10 results of a siteStick out tongueovo.com search:

     

    1 redirects (301) to another one of the top 10

    1 301 redirects to another page not on the list

    6 are NOINDEX and have been for a month or more

     

    So 8 out of 10 of the top results for our site should not be in your index at all, and somewhere around 299,840 are completely missing.  That's not so good.

     

    How can I help your crawler/indexer not be so bad?

     

    Wednesday, November 26, 2008 3:30 PM

Answers

  • So a couple of thoughts:

    I could not find a compete sitemap. The one linked from the robots.txt file is either partial or broken. If you get a complete one you should submit it at the webmaster tools.

    One other thing that may be an issue is that there appears to be a lot of duplicate content. For example, the pages at:
    http://westdennis.ma.povo.com/
    http://eastdennis.ma.povo.com/

    These pages are exactly the same different URL and title, but the remainder of the content is duplicated. This may be causing your pages to be deduped.  I am not sure how many of your pages have this content, but it could be part of the issue.You really need more unique content for each of the sections of the site you want to be indexed.

    One question about the noindex. Did you apply these at the page level or in the Robots file? I just would like to check the pages that have them if that is the case. We do our best to respect the REP. So I would want to ensure we are not doing anything outside of the REP.

    Jeremiah Andrick
    Sunday, November 30, 2008 6:29 AM

All replies

  • Hi DJmax,

     

    Sorry for the issues you are having with the forums. We reported these issues to the techs as soon as they started occuring a few days ago.

     

    Regarding your site: I'll do some further research and get back to you either Monday or Tuesday of next week.

     

    Thanks,

     

    Brett 

     

    Wednesday, November 26, 2008 8:53 PM
  • So a couple of thoughts:

    I could not find a compete sitemap. The one linked from the robots.txt file is either partial or broken. If you get a complete one you should submit it at the webmaster tools.

    One other thing that may be an issue is that there appears to be a lot of duplicate content. For example, the pages at:
    http://westdennis.ma.povo.com/
    http://eastdennis.ma.povo.com/

    These pages are exactly the same different URL and title, but the remainder of the content is duplicated. This may be causing your pages to be deduped.  I am not sure how many of your pages have this content, but it could be part of the issue.You really need more unique content for each of the sections of the site you want to be indexed.

    One question about the noindex. Did you apply these at the page level or in the Robots file? I just would like to check the pages that have them if that is the case. We do our best to respect the REP. So I would want to ensure we are not doing anything outside of the REP.

    Jeremiah Andrick
    Sunday, November 30, 2008 6:29 AM
  • Thanks very much for looking at this.  Overall site structure: we're a local wiki, currently operating mostly in Boston but with coverage in the rest of Massachusetts and New York City.  We have a wiki page (at a minimum) for every city in an area (e.g. Massachusetts which has about 450).  By default, those cities have "very similar" content on them (some text is different, local news should be different but sometimes isn't there at all).  I'm not too concerned with East and West Dennis, because they are clearly not "active" yet.  I'm much more concerned about our active areas, such as boston.povo.com and cambridge.ma.povo.com, etc.

    On the sitemap... We have one sitemap for each "city", served at the URL in the robots.txt.  It's very possible I've screwed this up, though Google reads this fine.  What made you think it was broken/incomplete?  I did submit it to webmaster tools, but not sure how to get a sense of whether it understood it.

    The noindex is primarily applied at the page level, because we have to wait to purge from Google before robots.txt'ing it.  For example, all search pages are noindexed, such as

    http://boston.povo.com/?search&tags=restaurant

    Thanks again.
    --Max
    Tuesday, December 2, 2008 5:20 PM
  • I should also note, the vast majority of our pages are in these "active" areas.  We have, for example, 17,000 unique pages in boston.povo.com, as well as some lists/cross cuts on those pages.
    Tuesday, December 2, 2008 6:58 PM