locked
encoding of accent characters in Spanish RRS feed

  • Question

  • When I ask for an english to spanish translation from Microsoft Translate for the text "For Spanish, press 2 then number sign.", I receive the translated text "Para espa�ol, pulse 2 y luego la tecla numeral." instead of the expected "Para español, pulse 2 y luego la tecla numeral.".  I assume the difference between the undesired "espa�ol" and desired "español" has to do with encoding the accent character "ñ".

    What can I change in my request handling to get the desired "español" response?

    FYI, this is the simple request being made:

    http://api.microsofttranslator.com/V2/Http.svc/Translate?appId=YOUR_APP_ID_HERE&contentType=text/html&from=en&to=es&text=For Spanish%2c press 2 then number sign%2e

    This is the response I receive:

    <string xmlns="http://schemas.microsoft.com/2003/10/Serialization/">Para espa�ol, pulse 2 y luego la tecla numeral.</string>

    Thanks.

    Monday, August 22, 2011 5:41 PM

Answers

  • Explanation: As I mentioned in the first post, I am using the AddTranslation method to add translations to the Microsoft Translators memory.  It turns out that I added an "improper" translation due to poor handling of the input text when I submitted the tranlation using the AddTranslation method.  My code to to request translations works properly, but the Microsoft Translator memory now sends back the invalid character I "taught" it.


    Jim Kazmer
    • Marked as answer by JimKazmer Monday, August 22, 2011 10:57 PM
    Monday, August 22, 2011 10:57 PM

All replies

  • Update... my current thinking is that api.microsofttranslator.com is returning some form of UTF-8 encoding.  There are two possibilities: (1) microsoft translator is sending an incorrect UTF8 encoding, or (2) my server-side application is mis-interpreting the UTF8 encoding.  There are two possible solutions: (1) how do I convert "�" (decimal 239, 191, 189) into  "ñ" on my end? Or, (2) change the request or my system so "ñ"  is received.

    Any insight you may have to direct my focus would be appreciated.  FYI, I am using WinHTTP in C++ tomake and receive my tranlsation requests.

     

    Monday, August 22, 2011 6:58 PM
  • The Microsoft Translator HTTP response header, found using WinHttpQueryHeaders(), is:

    HTTP/1.1 200 OK
    Date: Mon, 22 Aug 2011 20:24:17 GMT
    Content-Length: 118
    Content-Type: application/xml; charset=utf-8
    X-MS-Trans-Info: s=63644

    Monday, August 22, 2011 8:42 PM
  • Explanation: As I mentioned in the first post, I am using the AddTranslation method to add translations to the Microsoft Translators memory.  It turns out that I added an "improper" translation due to poor handling of the input text when I submitted the tranlation using the AddTranslation method.  My code to to request translations works properly, but the Microsoft Translator memory now sends back the invalid character I "taught" it.


    Jim Kazmer
    • Marked as answer by JimKazmer Monday, August 22, 2011 10:57 PM
    Monday, August 22, 2011 10:57 PM