locked
Error Translating content (HTML format) more than 2048 bytes RRS feed

  • Question

  • Hi,

    We have developed a translation solution using service proxy class - TranslatorContainer.cs (https://datamarket.azure.com/dataset/1899a118-d202-492c-aa16-ba21c33c06cb) using the methodology described by Peter Puszkiewicz (http://code.msdn.microsoft.com/Walkthrough-Translator-in-7e0be0f7#content).

    We are loading the translated content into the Editor(ASP.NET Web Form), and source and translated content is in HTML format. It errors out (error code: 404) when URL encoded content exceeds 2048 bytes, for lesser content size it works fine.

    We had opened up support ticket with Microsoft Azure team, and following was the response.

    "The problem is that the a request is bigger than 2048 bytes. And since it is translated from a html source, the encoded request is much bigger than the original text.
    The translated request should be less than 2048 bytes.

    This is by design."

    I wonder how come Microsoft Word could translate entire page (Review > Translate) and their Translator API limits to 2048 bytes per transaction!

    I would really appreciate if anyone has work around to this issue.

    Thank you.


    • Edited by ksiu Thursday, June 28, 2012 10:22 PM
    Thursday, June 28, 2012 5:24 PM

Answers

  • I sent you the code...did you get a chance to try it?

    For everyone else, here is some code. I'm continuing to polish this into an app, but if you need a starter, here it is...

    First, you should download and install the HTML agility pack: http://htmlagilitypack.codeplex.com/

     Then, try code like this…It does need some tidying up, most notably in the creation of the language list, which shouldn't be hard coded, but it works well...

    using System;

    using System.Collections.Generic;

    using System.ComponentModel;

    using System.Data;

    using System.Drawing;

    using System.Linq;

    using System.Text;

    using System.Web;

    using System.Windows.Forms;

    using System.IO;

    using HtmlAgilityPack;

    namespace PageTranslator

    {

        public partial class Form1 : Form

        {

            private DateTime tokenAge=DateTime.Now;

            private string Access_Token="";

            private static string strFrom = "en";

            private static string strTo = "es";

            public Form1()

            {

                InitializeComponent();

            }

            private string GetLangFromCombo(ComboBox cmb)

            {

                string strSelected="en";

                if(cmb.SelectedItem!=null)

                    strSelected = cmb.SelectedItem.ToString();

                strSelected = strSelected.Split(':')[0].Trim();

                return strSelected;

            }

            private void button1_Click(object sender, EventArgs e)

            {

                Stream myStream = null;

                strFrom = GetLangFromCombo(comboBox1);

                strTo = GetLangFromCombo(comboBox2);

                OpenFileDialog openFileDialog1 = new OpenFileDialog();

               

                openFileDialog1.InitialDirectory = "c:\\";

                openFileDialog1.Filter = "HTM files (*.htm)|*.htm|HTML files (*.html)|*.html";

                openFileDialog1.FilterIndex = 2;

                openFileDialog1.RestoreDirectory = true;

                if (openFileDialog1.ShowDialog() == DialogResult.OK)

                {

                    try

                    {

                        if ((myStream = openFileDialog1.OpenFile()) != null)

                        {

                                string strFileName = openFileDialog1.FileName;

                                textBox3.Text = strFileName;

                                HtmlAgilityPack.HtmlDocument html = new HtmlAgilityPack.HtmlDocument();

                                html.Load(myStream);

                                processDocument(html);

                                string strReplace = "_" + strTo + ".htm";

                                strFileName = strFileName.Replace(".htm", strReplace);

                                html.Save(strFileName);

                                MessageBox.Show("Translation complete and saved to: " + strFileName);

                        }

                    }

                    catch (Exception ex)

                    {

                        MessageBox.Show("Error: " + ex.Message);

                    }

                }

            }

            private void processDocument(HtmlAgilityPack.HtmlDocument html)

            {

                HtmlNodeCollection coll = html.DocumentNode.SelectNodes("//text()[normalize-space(.) != '']");

                double tot = coll.Count;

                double n = 0;

                double comp = 0;

                foreach (HtmlNode node in coll)

                {

                    n++;

                    System.Diagnostics.Debug.WriteLine(node.InnerText.Trim());

                    if (node.InnerText == node.InnerHtml)

                    {

                        node.InnerHtml = translateText(node.InnerText);

                    }

                    comp = (n / tot) * 100;

                    label5.Text = Math.Round(comp,2) + "%";

                    Application.DoEvents();

                }

            }

            string translateText(string strInput)

            {

               

                if ((DateTime.Now.Subtract(tokenAge).Minutes > 8) || Access_Token=="")

                {

                    GetAccessToken();

                }

                string txtToTranslate = strInput;

                string headerValue = "Bearer " + Access_Token;

               

                string uri = "http://api.microsofttranslator.com/v2/Http.svc/Translate?text=" + System.Web.HttpUtility.UrlEncode(txtToTranslate) + "&from=" + strFrom + "&to=" + strTo;

                System.Net.WebRequest translationWebRequest = System.Net.WebRequest.Create(uri);

                translationWebRequest.Headers.Add("Authorization", headerValue);

                System.Net.WebResponse response = null;

                response = translationWebRequest.GetResponse();

                System.IO.Stream stream = response.GetResponseStream();

                System.Text.Encoding encode = System.Text.Encoding.GetEncoding("utf-8");

                System.IO.StreamReader translatedStream = new System.IO.StreamReader(stream, encode);

                System.Xml.XmlDocument xTranslation = new System.Xml.XmlDocument();

                xTranslation.LoadXml(translatedStream.ReadToEnd());

                return xTranslation.InnerText;

            }

            void GetAccessToken()

            {

                string clientID = "YourClientID";

                string clientSecret = "YourClientSecret";

                String strTranslatorAccessURI = "https://datamarket.accesscontrol.windows.net/v2/OAuth2-13";

                String strRequestDetails = string.Format("grant_type=client_credentials&client_id={0}&client_secret={1}&scope=http://api.microsofttranslator.com", HttpUtility.UrlEncode(clientID), HttpUtility.UrlEncode(clientSecret));

                System.Net.WebRequest webRequest = System.Net.WebRequest.Create(strTranslatorAccessURI);

                webRequest.ContentType = "application/x-www-form-urlencoded";

                webRequest.Method = "POST";

                byte[] bytes = System.Text.Encoding.ASCII.GetBytes(strRequestDetails);

                webRequest.ContentLength = bytes.Length;

                using (System.IO.Stream outputStream = webRequest.GetRequestStream())

                {

                    outputStream.Write(bytes, 0, bytes.Length);

                }

                System.Net.WebResponse webResponse = webRequest.GetResponse();

                System.Runtime.Serialization.Json.DataContractJsonSerializer serializer = new System.Runtime.Serialization.Json.DataContractJsonSerializer(typeof(AdmAccessToken));

                //Get deserialized object from JSON stream

                AdmAccessToken token = (AdmAccessToken)serializer.ReadObject(webResponse.GetResponseStream());

                Access_Token = token.access_token;

                tokenAge = System.DateTime.Now;

            }

            List<string> GetLanguages()

            {

                List<string> lstReturn = new List<string>();

                lstReturn.Add("ar : Arabic");

                lstReturn.Add("bg : Bulgarian");

                lstReturn.Add("ca : Catalan");

                lstReturn.Add("zh-CHS : Chinese Simplified");

                lstReturn.Add("zh-CHT : Chinese Traditional");

                lstReturn.Add("cs : Czech");

                lstReturn.Add("da : Danish");

                lstReturn.Add("nl : Dutch");

                lstReturn.Add("en : English");

                lstReturn.Add("et : Estonian");

                lstReturn.Add("fi : Finnish");

                lstReturn.Add("fr : French");

               lstReturn.Add("de : German");

                lstReturn.Add("el : Greek");

                lstReturn.Add("ht : Haitian Creole");

                lstReturn.Add("he : Hebrew");

                lstReturn.Add("hi : Hindi");

                lstReturn.Add("hu : Hungarian");

                lstReturn.Add("id : Indonesian");

                lstReturn.Add("it : Italian");

                lstReturn.Add("ja : Japanese");

                lstReturn.Add("ko : Korean");

                lstReturn.Add("lv : Latvian");

                lstReturn.Add("lt : Lithuanian");

                lstReturn.Add("mww : Hmong Daw");

                lstReturn.Add("no : Norwegian");

                lstReturn.Add("pl : Polish");

                lstReturn.Add("pt : Portuguese");

                lstReturn.Add("ro : Romanian");

                lstReturn.Add("ru : Russian");

                lstReturn.Add("sk : Slovak");

                lstReturn.Add("sl : Slovenian");

                lstReturn.Add("es : Spanish");

                lstReturn.Add("sv : Swedish");

                lstReturn.Add("th : Thai");

                lstReturn.Add("tr : Turkish");

                lstReturn.Add("uk : Ukranian");

                lstReturn.Add("vi : Vietnamese");

                return lstReturn;

            }

            private void Form1_Load(object sender, EventArgs e)

            {

                List<string> langList = GetLanguages();

                foreach (string s in langList)

                {

                    comboBox1.Items.Add(s);

                    comboBox2.Items.Add(s);

                }

            }

        }

        public class AdmAccessToken

        {

            public string access_token { get; set; }

            public string token_type { get; set; }

            public string expires_in { get; set; }

            public string scope { get; set; }

        }

    }

    • Proposed as answer by Laurence Moroney Monday, July 23, 2012 6:45 PM
    • Marked as answer by ksiu Friday, August 10, 2012 7:53 PM
    Monday, July 23, 2012 6:45 PM

All replies

  • The entire page translation goes through the page chunk by chunk in order to translate it, so that 2048 bytes per transaction is maintained.

    I have a desktop app that can be used for translating entire HTML pages with no size restriction that I've been working on. It's not quite ready to release, but if you want to try it out, please drop me a line at v-lamo@@@@microsoft-dot-com

    Monday, July 2, 2012 1:57 PM
  • Hi Laurence,

    Thank you for your response.

    I have emailed you, could you please reply me back with your solution?

    Thank you.



    • Edited by ksiu Friday, July 6, 2012 1:37 PM
    Friday, July 6, 2012 1:36 PM
  • I sent you the code...did you get a chance to try it?

    For everyone else, here is some code. I'm continuing to polish this into an app, but if you need a starter, here it is...

    First, you should download and install the HTML agility pack: http://htmlagilitypack.codeplex.com/

     Then, try code like this…It does need some tidying up, most notably in the creation of the language list, which shouldn't be hard coded, but it works well...

    using System;

    using System.Collections.Generic;

    using System.ComponentModel;

    using System.Data;

    using System.Drawing;

    using System.Linq;

    using System.Text;

    using System.Web;

    using System.Windows.Forms;

    using System.IO;

    using HtmlAgilityPack;

    namespace PageTranslator

    {

        public partial class Form1 : Form

        {

            private DateTime tokenAge=DateTime.Now;

            private string Access_Token="";

            private static string strFrom = "en";

            private static string strTo = "es";

            public Form1()

            {

                InitializeComponent();

            }

            private string GetLangFromCombo(ComboBox cmb)

            {

                string strSelected="en";

                if(cmb.SelectedItem!=null)

                    strSelected = cmb.SelectedItem.ToString();

                strSelected = strSelected.Split(':')[0].Trim();

                return strSelected;

            }

            private void button1_Click(object sender, EventArgs e)

            {

                Stream myStream = null;

                strFrom = GetLangFromCombo(comboBox1);

                strTo = GetLangFromCombo(comboBox2);

                OpenFileDialog openFileDialog1 = new OpenFileDialog();

               

                openFileDialog1.InitialDirectory = "c:\\";

                openFileDialog1.Filter = "HTM files (*.htm)|*.htm|HTML files (*.html)|*.html";

                openFileDialog1.FilterIndex = 2;

                openFileDialog1.RestoreDirectory = true;

                if (openFileDialog1.ShowDialog() == DialogResult.OK)

                {

                    try

                    {

                        if ((myStream = openFileDialog1.OpenFile()) != null)

                        {

                                string strFileName = openFileDialog1.FileName;

                                textBox3.Text = strFileName;

                                HtmlAgilityPack.HtmlDocument html = new HtmlAgilityPack.HtmlDocument();

                                html.Load(myStream);

                                processDocument(html);

                                string strReplace = "_" + strTo + ".htm";

                                strFileName = strFileName.Replace(".htm", strReplace);

                                html.Save(strFileName);

                                MessageBox.Show("Translation complete and saved to: " + strFileName);

                        }

                    }

                    catch (Exception ex)

                    {

                        MessageBox.Show("Error: " + ex.Message);

                    }

                }

            }

            private void processDocument(HtmlAgilityPack.HtmlDocument html)

            {

                HtmlNodeCollection coll = html.DocumentNode.SelectNodes("//text()[normalize-space(.) != '']");

                double tot = coll.Count;

                double n = 0;

                double comp = 0;

                foreach (HtmlNode node in coll)

                {

                    n++;

                    System.Diagnostics.Debug.WriteLine(node.InnerText.Trim());

                    if (node.InnerText == node.InnerHtml)

                    {

                        node.InnerHtml = translateText(node.InnerText);

                    }

                    comp = (n / tot) * 100;

                    label5.Text = Math.Round(comp,2) + "%";

                    Application.DoEvents();

                }

            }

            string translateText(string strInput)

            {

               

                if ((DateTime.Now.Subtract(tokenAge).Minutes > 8) || Access_Token=="")

                {

                    GetAccessToken();

                }

                string txtToTranslate = strInput;

                string headerValue = "Bearer " + Access_Token;

               

                string uri = "http://api.microsofttranslator.com/v2/Http.svc/Translate?text=" + System.Web.HttpUtility.UrlEncode(txtToTranslate) + "&from=" + strFrom + "&to=" + strTo;

                System.Net.WebRequest translationWebRequest = System.Net.WebRequest.Create(uri);

                translationWebRequest.Headers.Add("Authorization", headerValue);

                System.Net.WebResponse response = null;

                response = translationWebRequest.GetResponse();

                System.IO.Stream stream = response.GetResponseStream();

                System.Text.Encoding encode = System.Text.Encoding.GetEncoding("utf-8");

                System.IO.StreamReader translatedStream = new System.IO.StreamReader(stream, encode);

                System.Xml.XmlDocument xTranslation = new System.Xml.XmlDocument();

                xTranslation.LoadXml(translatedStream.ReadToEnd());

                return xTranslation.InnerText;

            }

            void GetAccessToken()

            {

                string clientID = "YourClientID";

                string clientSecret = "YourClientSecret";

                String strTranslatorAccessURI = "https://datamarket.accesscontrol.windows.net/v2/OAuth2-13";

                String strRequestDetails = string.Format("grant_type=client_credentials&client_id={0}&client_secret={1}&scope=http://api.microsofttranslator.com", HttpUtility.UrlEncode(clientID), HttpUtility.UrlEncode(clientSecret));

                System.Net.WebRequest webRequest = System.Net.WebRequest.Create(strTranslatorAccessURI);

                webRequest.ContentType = "application/x-www-form-urlencoded";

                webRequest.Method = "POST";

                byte[] bytes = System.Text.Encoding.ASCII.GetBytes(strRequestDetails);

                webRequest.ContentLength = bytes.Length;

                using (System.IO.Stream outputStream = webRequest.GetRequestStream())

                {

                    outputStream.Write(bytes, 0, bytes.Length);

                }

                System.Net.WebResponse webResponse = webRequest.GetResponse();

                System.Runtime.Serialization.Json.DataContractJsonSerializer serializer = new System.Runtime.Serialization.Json.DataContractJsonSerializer(typeof(AdmAccessToken));

                //Get deserialized object from JSON stream

                AdmAccessToken token = (AdmAccessToken)serializer.ReadObject(webResponse.GetResponseStream());

                Access_Token = token.access_token;

                tokenAge = System.DateTime.Now;

            }

            List<string> GetLanguages()

            {

                List<string> lstReturn = new List<string>();

                lstReturn.Add("ar : Arabic");

                lstReturn.Add("bg : Bulgarian");

                lstReturn.Add("ca : Catalan");

                lstReturn.Add("zh-CHS : Chinese Simplified");

                lstReturn.Add("zh-CHT : Chinese Traditional");

                lstReturn.Add("cs : Czech");

                lstReturn.Add("da : Danish");

                lstReturn.Add("nl : Dutch");

                lstReturn.Add("en : English");

                lstReturn.Add("et : Estonian");

                lstReturn.Add("fi : Finnish");

                lstReturn.Add("fr : French");

               lstReturn.Add("de : German");

                lstReturn.Add("el : Greek");

                lstReturn.Add("ht : Haitian Creole");

                lstReturn.Add("he : Hebrew");

                lstReturn.Add("hi : Hindi");

                lstReturn.Add("hu : Hungarian");

                lstReturn.Add("id : Indonesian");

                lstReturn.Add("it : Italian");

                lstReturn.Add("ja : Japanese");

                lstReturn.Add("ko : Korean");

                lstReturn.Add("lv : Latvian");

                lstReturn.Add("lt : Lithuanian");

                lstReturn.Add("mww : Hmong Daw");

                lstReturn.Add("no : Norwegian");

                lstReturn.Add("pl : Polish");

                lstReturn.Add("pt : Portuguese");

                lstReturn.Add("ro : Romanian");

                lstReturn.Add("ru : Russian");

                lstReturn.Add("sk : Slovak");

                lstReturn.Add("sl : Slovenian");

                lstReturn.Add("es : Spanish");

                lstReturn.Add("sv : Swedish");

                lstReturn.Add("th : Thai");

                lstReturn.Add("tr : Turkish");

                lstReturn.Add("uk : Ukranian");

                lstReturn.Add("vi : Vietnamese");

                return lstReturn;

            }

            private void Form1_Load(object sender, EventArgs e)

            {

                List<string> langList = GetLanguages();

                foreach (string s in langList)

                {

                    comboBox1.Items.Add(s);

                    comboBox2.Items.Add(s);

                }

            }

        }

        public class AdmAccessToken

        {

            public string access_token { get; set; }

            public string token_type { get; set; }

            public string expires_in { get; set; }

            public string scope { get; set; }

        }

    }

    • Proposed as answer by Laurence Moroney Monday, July 23, 2012 6:45 PM
    • Marked as answer by ksiu Friday, August 10, 2012 7:53 PM
    Monday, July 23, 2012 6:45 PM
  • Thanks Laurence for the solution.

    It did work for couple of requests and then it started getting error "400: Bad request" on line

    response = translationWebRequest.GetResponse();

    by catching this exception,

    txtError.Text = new StreamReader(ex.Response.GetResponseStream()).ReadToEnd();
    It shows me following description...
    <html><body><h1>TranslateApiException</h1><p>Method: Translate()</p><p>Message: Cannot find an active Azure Market Place Translator Subscription associated with the request credentials.</p><code></code><p>message id=3641.V2_Rest.Translate.45F84135</p></body></html>

    From account page https://datamarket.azure.com/account

    I use "CustomerID" as "Client ID" and "Primary account key" as "Client secret" and also I have not registered the application (as told by Azure team that it is not required to run Translator API) and it was working with the same code until Friday evening with my other prototype. I could still translate using proxy class implementation I mentioned earlier (June, 28) without any error and also through the widget (clicking My Account > My Data > Use) on the account page.  This subscription is active.

    Also, when I included HTMLAgilityPack using NuGet in my project, and while running it kept looking for HtmlDocument.cs under "C:\" drive. In order to solve the problem I downloaded the source code and compile the project and referenced that DLL to my translation project.

    Thank you for any suggestions/solution.

    Monday, July 30, 2012 4:39 PM
  • The Access token times out after 10 minutes, so you should check a timer after you get the access token, and if it has counted down past 10 minutes, you should get the access token again. That's likely the solution to the first issue.

    My code does get a 'tokenAge' variable for the current date/time so it would be pretty simple to amend it to check if 10 minutes has expired.

    As for HtmlAgilityPack, I don't know (sorry!), maybe post on their site?

    Monday, July 30, 2012 4:52 PM
  • I did tried with your code and also other prototype where I get a new token everytime application runs the translation.

    I also tried registering an application (https://datamarket.azure.com/developer/applications) with the same clientid which I got when I subscribed to the translator api. Application does show up on the above url but not under https://datamarket.azure.com/account/applications

    Still getting same error.

    Thanks.


    • Edited by ksiu Monday, July 30, 2012 6:12 PM
    Monday, July 30, 2012 5:20 PM
  • Now, it seems to be working after registering an application under  (https://datamarket.azure.com/developer/applications) with clientid I used to subscribe Translator API from https://datamarket.azure.com/account/. And using clientsecret from "developer" url and not from the  https://datamarket.azure.com/account/applications.

    I do not understand why my other application which uses proxy class utilizes clientsecret from  https://datamarket.azure.com/account/applications still works without registering an application(http://code.msdn.microsoft.com/Walkthrough-Translator-in-7e0be0f7#content), while https://datamarket.accesscontrol.windows.net/v2/OAuth2-13 requires application registration and clientsecret from the "developer" url?

    What is the correct process flow?

    Also,  I have a question about transaction.

    As when it loops through each node and call translation api, each transaction would be 1000 characters (2000 transaction/month with the free offering). Would it be costly in terms of transaction compared to passing entire document to the translation api? 

    Thank you for all your help.


    • Edited by ksiu Tuesday, July 31, 2012 3:01 AM
    Monday, July 30, 2012 9:38 PM
  • I am not sure why you would have a difference when using proxy classes...I always use both the client id and the client secret as documented here:http://blogs.msdn.com/b/translation/p/gettingstarted1.aspx  without any problems.

    As for the transaction -- the costs should be the same (or maybe a little less) this way, because you are passing in the raw characters to translate, and nothing more.

    Wednesday, August 1, 2012 3:24 PM
  • Laurence,

    If I pass following HTML document to translation API directly without using HtmlAgilityPack, I think it would be single transaction of minimum 1000 characters.

    <!DOCTYPE html>
    <html>
    <body>
    <h1>Good morning</h1>
    <p>Thanks.</p>
    </body>
    </html>

    Where as with HtmlAgilityPack, it would call translate method for each node "Good morning" and "Thanks". Hence 2 transactions of 1000 characters each.

    Please correct me if I am wrong.

    Thanks.

    Wednesday, August 1, 2012 5:21 PM