Specific Problems I'm Having With Crawler - Help Please RRS feed

  • Question


    My site is http://www.maturefitness.com/ and I am having problems with the way my sight is being crawled and indexed. Specifically,


    1. The number of pages listed in Webmaster Live (5) is different from the number of pages actually indexed (4) and produced by a search of "site:www.maturefitness.com". Today is the first day that pages have been added in weeks.


    2. My home page (listed as "maturefitness.com", not "www.maturefitness.com" in Webmaster Live) does not appear in the index. This page shows a last crawled date of 1/6/2008 which is prior to the opening of my site on 1/18/2008 and shows a page rank of five screen bars. When clicked on in Webmaster Live, it used to show a cached image of my “under construction” page until a few days ago, but now shows the current home page on the site (I don't know if this is a live version or also cached).


    3. Previously, Webmaster Live said four pages were indexed and showed three, but one was dropped and it went to saying three pages were indexed and showing two. However, no pages showed up in a "site:www.maturefitness.com" search, so I don't believe any were actually indexed.


    4. robots.txt file is being ignored. You may view it here: "http://www.maturefitness.com/robots.txt". Note that the first Disallow is for "/servlet/Cart". This has worked to keep all of the other search engines off that page; however, Live Search has added it to the index. It is listed without a page title (as "www.maturefitness.com") in Webmaster Live and shows up under a "site:www.maturefitness.com" search as having been indexed. This Disallow has been in my robots.txt file for months, long before I opened the site. I have a screenshot of that search if you need it.


    5. Live search appears to find and read my sitemap okay (hhtp://www.maturefitness.com/sitemap.xml.gz) based on my own server logs, but appears not to be using it to index my site. This is based solely on my interpretation of the logs, and the information in the logs that the other search engines are producing, but is not conclusive.


    6. A search on "www.maturefitness.com" does not return my site (at least not in the first 500 listings, which is as deep as I have looked). The search specifically asks if I am looking for "mature fitness" and returns some 8,000 links. Before you send me off to read about how to improve my rankings, I would like to mention some of the links that are listed. On about page four of this particular search the first "spam to forum" link occurs advertising links to porn. These increase in frequency until at about page 15 when they completely dominate the listings with a very few legitimate listing interspersed. They come from all over the world, with more and more titles in an oriental typeface the deeper into the results you go. They are mostly for porn, but a few Viagra and other ads do show up. Most are duplicates and many of them include titles that brag about which "spam bot" was used to produce them. Many of them appear as links to legitimate and worthwhile forums, however some are forums that appear to have been set up for this very purpose. A few of the links that I tried are to posts that have already been removed from the forums. I believe this particular search should be of interest to those who are developing and tuning the algorithms.


    Now, I understand that many factors determine whether a site is listed in a search, but if you tell me this is how the indexing algorithm is supposed to work, my question would be why? My site has content very relevant to the target market and that alone should gain me at least a listing under a search for my specific URL above spammed porn ads. As a matter of fact, one of the pages that has appeared today in the Live Search index (http://www.maturefitness.com/servlet/Page?template=newsarchive) is the page where all past issues of the newsletter from the American Senior Fitness Association are archived. This is the only place on the web where these are archived, and was done with their complete cooperation and support.


    I hope someone (Jeremiah?) from MSN will be able to help me with these problems. I would like to get my site at least listed in the index.


    Thanks for any assistance anyone may provide.

    Thursday, February 7, 2008 2:08 PM


All replies

  • I think the fact that you are indexed at all at this stage is very positive.  It seems MSN ignores the robots.txt file for a period of time after it was last pulled.  Obviously from the sounds of it your sitemap is not listing the files you are hoping to not have indexed?  A week or so ago we inadvertently blocked the writing pad page in the robots file, but it was in the sitemap and still indexed by MSN.  When we removed it from the sitemap it stopped getting indexed.  We've since corrected that issue.


    It sounds like you are off to a very positive start.  If the site was just launched on 1/18 I'd give it another 1-2 weeks to settle in.  Perhaps Jeremiah can shed more light on the robots issue as others are reporting similar problems.


    We found our solution in the sitemap, but it's hard to believe everyone is having the same issue.

    Thursday, February 7, 2008 8:16 PM
  • Thank you for the encouragement. And no, the pages I've blocked in robot.txt are not listed in my sitemap.

    Thursday, February 7, 2008 10:16 PM
  •  Spheric wrote:

    4. robots.txt file is being ignored. You may view it here: "http://www.maturefitness.com/robots.txt". Note that the first Disallow is for "/servlet/Cart". This has worked to keep all of the other search engines off that page; however, Live Search has added it to the index. It is listed without a page title (as "www.maturefitness.com") in Webmaster Live and shows up under a "site:www.maturefitness.com" search as having been indexed. This Disallow has been in my robots.txt file for months, long before I opened the site. I have a screenshot of that search if you need it.


    An additional piece of information: this link in the Webmater Live tools also shows a language code of "zz" rather than "en." My entire site is in english and designated so in the doctype for each page.

    Friday, February 8, 2008 6:26 AM
  • Well, one more page was added to Webmaster Live on the eighth of February, but still no results on a search for my domain name in Live Search. A "site:www.maturefitness.com/" search produces five pages including the strange one that was supposed to be blocked by robots.txt file with a language code of "zz" and a title of "www.maturefitness.com". Webmaster Live tools however says there are now six pages supposedly indexed.

    Monday, February 11, 2008 2:05 AM
  • You've got another one of those keyword domains like ours that may also be triggering some issues.  Does it say your site is blocked?   I'd send an email off to the web spam team at MS just in case, and I'm sure Jeremiah will have a look at some point.   Our site is similar in having a name that can be good and bad, promomanagers but the first five letters are often black flagged by mail etc.


    Good luck, you are way ahead of where we were so keep building relationships with others in terms of links, and building content.


    Monday, February 11, 2008 2:14 PM
  • Thank you again. I don't believe the site is blocked. In Webmater Live tools it says "Blocked: No". I would assume that means we are not blocked.

    Tuesday, February 12, 2008 12:43 PM

    Yay! Finally, a single page has actually been indexed so that it comes up under a search on my full domain name. I am still having the other problems, though, so would still appreciate any assistance. Also, by crawling one or two pages a week, the MSNbot is going to take over a year to index my site, small as it is (only around 100 pages so far).
    Wednesday, February 13, 2008 6:18 AM
  • Well, just thought I'd update for you other webmasters out there. I am no longer seeking help, but instead have given up on this broken piece of *** just like Microsoft has given up on us. Yes, I know they insist that it is not broken. It intentionally ignores the robot.txt file, ranks bot-spammed forum porn ads higher than legitimate sites, crawls and indexes sites slower than a script-kiddie bot, and from time to time deletes all previously indexed pages without explanation. This is, of course, all by design because Microsoft doesn't have the computing resources that the other search engine companies have at their disposal, and of course will correct itself if we just get more high ranking backlinks. There isn't really any number that you need, just more than you have. My site shows ten of the five green bar backlinks (the tool only shows ten links, so...). I just want you all to understand that I have been keeping up.


    For the past couple of weeks, the tool has been saying I had around 40 pages indexed. Of course, there is no way to verify that, since the tool only shows five pages. However, a site search only would turn up 18 pages. Consistently. And there was only one page that I could ever get to turn up in any keyword search.


    Well, today Live Search decided to dump all but three pages from the index. And one of those three is my zipped sitemap. So, I am now back down to two legitimate pages indexed.


    This is a freaking joke. No wonder Jeremiah no longer even bothers to show up. Oh, by the by, I am already first page for most of my relevant search terms on both Google and Yahoo. Both have had my entire site indexed for weeks now. I'm sure glad they haven't caught on to the fact that I'm scamming them by not having enough of the proper backlinks to my site.

    Saturday, March 1, 2008 2:46 PM
  • Hi,


    So I ran a couple queries that I thought might be relevant to your site and believe you are battling against yourself for rank. You have .net and .org mirror sites that are taking first and second, etc for "senior fitness certification": http://search.msn.com/results.aspx?q=Senior+Fitness+Certification&FORM=QSRE2 


    Here is the result for "Mature Fitness" http://search.msn.com/results.aspx?q=mature+fitness&form=QBRE again, your mirror sites take most of the first page although you have one page for maturefitness.com ranked at #3.



    Tuesday, April 1, 2008 4:27 PM