locked
How do I get rid of all of the repeats? RRS feed

  • Question

  • I am brand new to this api, using JSON for my returns.  I did a query, but the front page is full of links to the exact same page, with different variables in the URL.  How can I clean this up?

    Here is the code.

            var requestStr = "http://api.search.live.net/json.aspx?"
           
                // Common request fields (required)
                + "AppId=" + AppId
                + "&Query=Cher%20site:harrahs.com%20OR%20site:caesarspalace.com"
                + "&Sources=Web"
               
                // Common request fields (optional)
                + "&Version=2.0"
                + "&Market=en-us"
                + "&Adult=Moderate"
                + "&Options=EnableHighlighting"

                // Web-specific request fields (optional)
                + "&Web.Count=10"
                + "&Web.Offset=0"
                + "&Web.FileType=DOC"
                + "&Web.Options=DisableHostCollapsing+DisableQueryAlterations"

                // JSON-specific request fields (optional)
                + "&JsonType=callback"
                + "&JsonCallback=SearchCompleted";
    Tuesday, March 24, 2009 9:16 PM

Answers

  • FYI: DisableHostCollapsing is not required when using the site: operator.

    A small syntax error with your query is causing the relevancy issue you are experiencing. Try changing your query to the following:
    "&Query=Cher+%28site%3Aharrahs.com+OR+site%3Acaesarspalace.com%29" 

    Also you should remove Web.FileType=DOC from your query unless you are really looking for word documents (which we don't appear to have any indexed on these sites).

    Good luck!
    • Marked as answer by AlessC Saturday, March 28, 2009 5:40 AM
    Saturday, March 28, 2009 12:05 AM

All replies

  • I am not sure I understand exactly what the problem is. The query you are issuing returns indeed several pages with the same title, but with a different URL. See the result on Live.com
    http://search.live.com/results.aspx?q=Cher+site%3Aharrahs.com+OR+site%3Acaesarspalace.com&form=QBNO

    If this is what you are seeing as the API response, the problem lies in the structure of the sites you are searching and how they do their SEO. They seem to tend to reuse a lot of material across different pages that will be marked as such in the web index.
    An easy fix on your side would be to have a more specific query.

    HTH

    --Alessandro
    Wednesday, March 25, 2009 4:32 AM
  • FYI: DisableHostCollapsing is not required when using the site: operator.

    A small syntax error with your query is causing the relevancy issue you are experiencing. Try changing your query to the following:
    "&Query=Cher+%28site%3Aharrahs.com+OR+site%3Acaesarspalace.com%29" 

    Also you should remove Web.FileType=DOC from your query unless you are really looking for word documents (which we don't appear to have any indexed on these sites).

    Good luck!
    • Marked as answer by AlessC Saturday, March 28, 2009 5:40 AM
    Saturday, March 28, 2009 12:05 AM