UCMA, Speech Server, SDK, SAPI, TellMe - What to use to allow web applications to perform interactive TTS
Saturday, February 12, 2011 12:27 AM
We have spent a fair bit of time searching the web and this website but the requirement we have at the moment is the integration of the Speech engines, mainly TTS (and ASR in the future) to be programmable/useable from a web app written in C# or similar to allow users of the web GUI frontend to perform interactive TTS conversions of the text they type in a form field/text box on the web-based GUI. Now, since many of the TTS engines we've played with so far incl. Nuance, Loquendo, Cepstral, Festival etc. have X number of voice fonts available. However, if a user is typing something in a non-english language using the Roman script e.g. typing a Chinese name like Zheng or an Indian name like Rakesh or a Russian name like Vitaly, then a regular english langauge voice in TTS engines mispronounces them severly that it's hard to even tell what it just said. So the only way to do it is type Phonetically. For e.g. vitaly needs to be written as Vi-Taa-Lee for it to pronounce it correctly. So, we want the non-english users to be able to try all these combinations out and when they like the result, they will click "save" and the app will save that generated audio-stream (depends on what the Speech Server produces like a wave or a mp3 or something) into the SQL Server DB or something for later access. I'm assuming for that we need an API and/or an SDK etc. to interact with the Speech Engine.
But there is so many products and versions that are in the Microsoft portfolio as if they can't figure out how to really put this n place as So many terms as mentioned in the subject are floating around and we can't find one place that explains the difference between all of them, what exists, what doesn't and which one we're supposed to be using. All we understand is that UCMA 3.0 includes the entire framework along with the Speech API and allows applications to interact with the new Lync Server etc. We also understand that Speech Server 2007 is now dead.
What we don't know is the deal with the SDK, SAPI 5.4 and TellMe. Can this be used for our purpose. It's also likely, that soon we'll introduce a service with ASR in it over the phone, hence we'll need the Speech platform to talk to a Border controller or some other SIP Based platform. Would it be better for us to use MRCP for this project and does UCMA and/or Lync Server do MRCP?
We didn't know what forum to post this in, so we're posting it in this one. Would anyone be able to help?
Thanks so much,