locked
Using translateArray with language auto-detect RRS feed

  • Question

  • Hi:

    I am trying to use the translateArray HTTP API without specifying the From language parameter.  Is it possible to specify texts with different source languages?  It seems the texts are translated correctly to the target language, but the detected source languages returned is always from the first input text, see example below.

    Request:

    <?xml version="1.0" encoding="utf-8"?>
    <TranslateArrayRequest>
     <AppId/>
     <From/>
     <Texts>
      <string xmlns="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
       這是一句中文例句。
      </string>
      <string xmlns="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
       This is an English string.
      </string>
     </Texts>
     <To>
      ja
     </To>
    </TranslateArrayRequest>

    Here's the response I got.  Somehow the "From" is "zh-CHT" for both strings. 

    <?xml version="1.0" encoding="utf-8"?>
    <ArrayOfTranslateArrayResponse xmlns="http://schemas.datacontract.org/2004/07/Microsoft.MT.Web.Service.V2" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
     <TranslateArrayResponse>
      <From>
       zh-CHT
      </From>
      <OriginalTextSentenceLengths>
       <int>
        9
       </int>
      </OriginalTextSentenceLengths>
      <TranslatedText>
       これは、中国語の例文です。
      </TranslatedText>
      <TranslatedTextSentenceLengths>
       <int>
        13
       </int>
      </TranslatedTextSentenceLengths>
     </TranslateArrayResponse>
     <TranslateArrayResponse>
      <From>
       zh-CHT
      </From>
      <OriginalTextSentenceLengths>
       <int>
        26
       </int>
      </OriginalTextSentenceLengths>
      <TranslatedText>
       これは、英語の文字列です。
      </TranslatedText>
      <TranslatedTextSentenceLengths>
       <int>
        13
       </int>
      </TranslatedTextSentenceLengths>
     </TranslateArrayResponse>
    </ArrayOfTranslateArrayResponse>

    Monday, August 10, 2015 4:59 PM

Answers

  • Hi TK,

    all elements of the array translate using the same source language. The source language is detected over all elements together, and the statistically most likely language wins.

    Let us know if this helps,
    Chris Wendt
    Microsoft Translator

    Thursday, August 13, 2015 5:08 AM

All replies

  • Hi TK,

    all elements of the array translate using the same source language. The source language is detected over all elements together, and the statistically most likely language wins.

    Let us know if this helps,
    Chris Wendt
    Microsoft Translator

    Thursday, August 13, 2015 5:08 AM
  • Hi Chris:

    Thanks for the clarifications, that makes sense.  From the response format I misunderstood that each text in the array is detected independently as there is a separate
    "From" element within each TranslateArrayResponse element.

    Thursday, August 13, 2015 3:57 PM
  • Yes, the From element is for convenience.

    We were overly cautious when designing the method. Language detection generally achieves higher accuracy on longer runs of text, but since then the auto-detect has improved and works OK on shorter segments, so we may revise the behavior in a future version of the API.

    You can use DetectArray() to get the language individually for each element, then split your array according to the responses into same language groups, then make a TranslateArray() call for each group.

    Hope this helps,
    Chris Wendt
    Microsoft Translator

    Thursday, August 13, 2015 4:02 PM