Saturday, May 16, 2009 8:05 AMHello
I cannot find a way to restrict the web search results to a given language in the API. I don't see any adequate parameter, and I tried to add, for example, language:en to the query (e.g. query=chat+language:en) but the API keep returning results in French (probably based on my IP location). Adding Options=DisableLocationDetection does not improve the situation.
I would appreciate any hints.
Many thanks !
Saturday, May 16, 2009 8:58 PMGood question. I'm interested in this too because we are about to add a language restriction to our search engine (noflail.com). We were planning to use the "language:" keyword for that purpose. We had found that it works well, but when I saw your message I experimented some more. In our search engine the user can declare his or her location (country or region) and language, so I went to noflail.com and set location/language to France/French (this results in the parameter "&market=fr-fr" being added to the url). Then I issued the query "nba". All results were in French. Then I issued the query "nba language:en" and many or even most results were still in French, confirming your observation. Curiously, if I repeat the experiment using Spain/Spanish as location/language (which is what usually experiment with), the query "nba language:en" has almost all its results in English.
It would be useful to know how the backend determines the language of a page, so that we can explain that to our users if they complain about getting the language wrong. Actually, what I find really surprising is that the backend can make the determination at all. For example, the second result of the query "nba" with market "France/French" is a page related to "Normandie Bretagne Automobiles occasions". When I issue the query "nba language:en", that result goes away. The result does not even appear for the query "Normandie Bretaghe Automobiles language:en". So the backend must be pretty sure that the page is in French. How does the backend know that? I've looked at the page and I cannot find anything in the page itself or the HTTP headers that says that the language is French. The DOCTYPE even says that the page is in English!
Perhaps the backend looks at the page contents and uses a dictionary to see if there are French words or English words or both? Some of the NBA pages related to the National Basketball Association are written in French but do have sections or words in English, so that could be an explanation.
Tuesday, May 19, 2009 1:26 AMOwner
Setting the market in the API request tells live search that it should give more *preference* to the web pages in the given market (which translates to a language and a location like en and US when market = en-US). Setting the language: in the query string tells live search that it should restrict the results only to pages that are in that language.
Having said that, Can you please give me the exact API request for the case you have mentioned where the language: does not seem to work?
Particularly I am interested in knowing the value you set for market=
Once I have that I will flag the issue to my team.
Thx for your feedback!
- Marked As Answer by AlessCOwner Tuesday, May 19, 2009 8:48 PM
Wednesday, May 20, 2009 2:27 AMHi Roopali,
Thanks for your explanation. Here is the URL of the API request:
As you can see, the market is fr-fr and the query is: nba language:en. Results 1, 2, 3, and 8 are in French. Result 2, www.nba.com/france/tv.html, is a French page but has an English menu at the bottom, which may explain while it is included. Results 1, 3 and 8 are missing the description, I suppose because they are RIAs, which may mean that their language could not be determined and that's why they are included.
(Three days ago there was a more interesting result for the same query, but it's gone now. If I remember correctly, everything was in French except for the word "News". But I may not have looked very carefully.)
Here is another example, where most of the results are in French:
The market is fr-fr and the query is: mines language:en.
I did View Source on the first three results and I can see possible explanations for the problem. Results 1 (re. Mines de Douai) and 3 (re. Mines de Nancy) create the page on the client using document.write(). Result 2 (re. Mines de Paris) has a redirection.
What this seems to say is that it's really hard to consistently determine the language. (One suggestion would be to follow redirections and execute the document.write() statements before determining the language, but that's probably difficult to do and computationally expensive...)
Wednesday, May 20, 2009 7:16 PMThanks for your detailed feedback Francisco. This is important for us. I am going to take this and run it by the appropriate team in live search.