none
Prosody Tag RRS feed

  • Question

  • Hi,

     

    When using the above tag, it seems to only affect the first few n words within the tag.

    For example:

     

    statementActivity1.MainPrompt.AppendSsmlMarkup("<prosody rate=\"+100%\">this should be read fast</prosody> : <prosody rate=\"+100%\">this should be read fast as well because it is in an identical prosody tag</prosody>");

     

    The above code appears to read the enclosed text within the first prosody tag OK, however, when you hear the text in the second prosody tag (which is longer), it seems to trail off.

     

    Also:

      

    statementActivity1.MainPrompt.AppendSsmlMarkup("<prosody volume=\"10\">this should be read quietly</prosody>");

     

    The above line of code does something similar, it reads the first couple of words quietly but then picks back up to the normal rate.

     

    There is also a pitch property on the prosody tag but I've had no joy in getting this to sound any different to the default.

     

    Is this a bug or am I doing something wrong with this? We are hoping to use the tag to slow the speech down when a caller may wish to write down any reference / contact details.

     

    Thanks in advance,

     

    Kev

    Friday, May 11, 2007 1:43 PM

Answers

  • OK, most likelly I'm on a later version which would indicate that this is fixed. I'll install the Beta version and make sure.
    Thursday, May 17, 2007 7:18 PM

All replies

  • I tested this my self and it does seem to be a bug. For example:

     

    statementActivity1.MainPrompt.AppendSsmlMarkup("<prosody volume=\"1\">this should be read quietly</prosody>");

     

    Only the first word is read at the specifed volume. Which is set so low that I can't hear it.

     

    The same with

    statementActivity1.MainPrompt.AppendSsmlMarkup("<prosody rate=\"+800%\">this should be read fast</prosody>");

     

    Only the first would is read at the high rate, which is so hight I can't understand it, while I can understand the rest of the words.

     

     

    You have a couple of alternatives, my suggestion is that you really shouldn't change the rate, but but add small to medium breaks between sets information:

    statementActivity1.MainPrompt.AppendText("Our address is");

    statementActivity1.MainPrompt.AppendBreak(Microsoft.SpeechServer.Synthesis.PromptBreak.Small);

    statementActivity1.MainPrompt.AppendText("1 2 3 main street");

     

      This can really make a difference and is probably better than changing the rate. Let me know if this works for you or not. If it doesn't I'll give you a couple other work arounds.
    Wednesday, May 16, 2007 7:22 AM
  • Hi Michael,

     

    Thanks for the response.

     

    I was kind of hoping we could do this without the need of coding a slower rate to be honest.

     

    Just to give you a bit of background on why we were trying to use the prosody tag. Part of our application reads some fields from a table and we are using a separate table to handle the SPOKEN_BEFORE_TEXT and SPOKEN_AFTER_TEXT. This method allows us to use a simple loop in our App to build up the full spoken string.

     

    One of the fields that we read out is a Reference Number so, for example, :

     

    SPOKEN_BEFORE_TEXT = "Please use the reference number "

    REFERENCE = "ABC/1234"

    SPOKEN_AFTER_TEXT = " when contacting us."

     

    So our spoken text would be "Please use the reference number ABC/1234 when contacting us."

     

    However, the reference field is read out too quickly - hence the idea of utilising the prosody tag.

     

    We could format the reference to enforce a pause e.g. "A : B : C : / : 1 : 2 : 3 : 4" or "A<break time="50ms" /> B......etc" but adding the prosody tag to our SPOKEN_BEFORE_TEXT seemed to be the easiest option and obviously closing it in our SPOKEN_AFTER TEXT, e.g.

     

    SPOKEN_BEFORE_TEXT = "Please use the reference number <prosody rate=\"-50%\">"

    REFERENCE = "ABC/1234"

    SPOKEN_AFTER_TEXT = "</prosody> when contacting us."

     

    We were hoping to use this method for a few fields, such as contact name, contact address etc. is the bug likely to be fixed? Is there another alternative to the colon separator / break tag methods that wouldn't require code changes?

     

    Kind regards,

     

    Kevin

    Wednesday, May 16, 2007 10:30 AM
  • Weird, I've tried this and I can't repro it.

     

     

    What MSS build is this on, is this during debugging or via a real call?

     

    Here's what I'm trying:

    Code Snippet

    private void statementActivity1_TurnStarting(object sender, TurnStartingEventArgs e)
      {
          statementActivity1.MainPrompt.AppendSsmlMarkup("<prosody rate=\"+100%\">The Speech Application Language Tags (SALT) 1.0 specification enables multimodal and telephony-enabled access to information, applications, and Web services from PCs, telephones, tablet PCs, and wireless personal digital assistants (PDAs). The Speech Application Language Tags extend existing mark-up languages such as HTML, XHTML, and XML. Multimodal access will enable users to interact with an application in a variety of ways: they will be able to input data using speech, a keyboard, keypad, mouse and/or stylus, and produce data as synthesized speech, audio, plain text, motion video, and/or graphics. Each of these modes will be able to be used independently or concurrently.</prosody>");
      }

     

    Wednesday, May 16, 2007 7:32 PM
  • I'm using Microsoft (R) Office Communications Server 2007, Speech Server Version 2.0.61285.0 (Version number that is listed in the Help --> About in VS) and it happens using the SIP Debugging phone.

     

    If I use your code snippet, the same thing happens, it goes back to the normal rate after the first word.

    Thursday, May 17, 2007 12:15 AM
  • I'm using the same version as Michael - 2.0.61.285.0 with the SIP Debugging phone.

     

    I also tried the code snippet and it slows back down to the default rate after the first few words, it seems to read "The Speech Application" very quickly but from the word Language (4th word) onwards, it's back to the normal rate?

     

    Thanks,

     

    Kev

    Thursday, May 17, 2007 7:16 AM
  • OK, most likelly I'm on a later version which would indicate that this is fixed. I'll install the Beta version and make sure.
    Thursday, May 17, 2007 7:18 PM
  • I am also experiencing this problem with the Beta version.  Has this issue been addressed in the RTM version?

     

     

    Friday, August 24, 2007 8:50 PM
  • I just installed the latest RTM version of OCS and verified that this bug has not been addressed.  Anyone know if this issue will ever be addressed?  Has anyone discovered a work-around?

     

    Monday, August 27, 2007 9:02 PM
  • Hi,

     

    I never received confirmation that this issue had been fixed in RTM (which I haven't installed yet). I assumed that with Ahmed being unable to repro it on his later version that the fix would be included in RTM but from your findings, this doesn't seem to be the case.

     

    The only work-around that we applied was to separate the text using colons. This worked fine with our Reference Number but I don't think this is a suitable work-araound if you require actual "words" to be read slowly.

     

    The obvious down side to the workaraound is that you need to run your spoken text through a regex or something similar.

     

    I would be interested to know if this will be fixed too.

     

    Kind regards,

     

    Kevin

    Tuesday, August 28, 2007 11:19 AM
  • I just downloaded and installed the RTM version and it's still working fine for me using the code I've previously posted and calling in via the debugging console.

     

    How are you using the tag?

     

     

     

    Wednesday, August 29, 2007 1:13 AM
  • What OS are you seeing this behaviour on?  We can only repro it on XP, not on Win2k3.  If this holds true for you, it's only a problem during development, and not production.  As such the workaround is to use Win2k3 to verify it's behaviour.

    Thursday, August 30, 2007 4:07 PM
  • Hi guys,

     

    I was experiencing this on Win2k3 as we were using that on our dev machines but that was using build 2.0.61.285.0, I haven't had chance to install RTM yet as I am working on another project - I will try and get some time to try it though.

     

    Obviously I don't know what OS rsponden was running whilst using RTM.

     

    Thanks,

     

    Kev

    Friday, August 31, 2007 8:12 AM
  • I am experiencing this problem on a Win2k3 server running OCS RTM build 2.0.61363.0.  Here's an example of how I'm using the tag to play addresses:  "<ssmlStick out tonguerosody rate = \"x-slow\">" + address + "</ssmlStick out tonguerosody>"

     

    I have written some code that replaces the spaces in the address with colons.  This helps slow down the TTS playback but it sounds a little jumpy.

    Wednesday, September 5, 2007 8:52 PM
  • I just tried the sample I have previously posted and it stays slow throughout playback.

     

    Can you post some sample text that is exhibiting this problem?

     

    Does the example I posted exhibit this problem on your system?

     

    Monday, September 10, 2007 5:18 PM
  • Here is some sample text exhibiting this problem:

    "<ssmlStick out tonguerosody rate =\"x-slow\">4532 Main Street West Suite 123 Dallas TX 75234</ssmlStick out tonguerosody>";

     

    The "For" portion of 4532 is spoken very slowly but, by the time the "Two" in 4532 is spoken, it's already back up to full speed.  The remainder of the address is played at full speed.

     

    Thursday, September 13, 2007 5:41 PM
  •  

    OK, that works fine for me via the debugger.

     

    Are you doing any pre-processing to the string before the speak request?

     

    Does the example I posted exibit the same dehaviour on your system?

    Thursday, September 13, 2007 8:05 PM
  • No pre-processing is being done to the string.

     

    I can't use the posted example because my application is not a workflow application.  My application is an upgraded SALT application that was originally running on an MSS 2004 system.  Could that be my problem?

     

    Thursday, September 13, 2007 9:33 PM
  • I just developed a workflow application and tested Ahmed's example.  The first three words, "The Speech Application", were spoken rapidly.  The fourth word, "Language", and everything thereafter was spoken at the normal rate.  This test verifies that this behavior is not specific to SALT applications.

     

     

    Thursday, September 20, 2007 5:27 PM
  • Was this a clean install of the RTM version or were previous versions on the machine?

     

    Was your test executed via the debugging console?

     

    Are you encountering this issue on other machines?

    Thursday, September 20, 2007 8:06 PM
  • I'm experiencing this same behavior on two different OCS systems.  Both systems were originally running the OCS Beta Release. 

     

    On one system, we just un-installed the Beta software, and installed the OCS RTM. 

     

    On the other system, we completely reformatted the hard-drive, and did an OCS RTM clean install.

     

    I've tested using the debugging console on my desktop and by placing actual calls into the OCS systems.  The result is the same.  Could that indicate that the problem is in my development environment and not the OCS?

     

     

    Thursday, September 20, 2007 8:55 PM
  • Well I don't want to rule anything out.

    Are you using a TIM perhaps?

    A clean install on a machine didn't repro the issue for me so we just need to figure out what's different.

    Can you tell us about your environment?
    Friday, September 21, 2007 3:58 AM
  •  

    No TIM's.  Both OCS systems are running the RTM release version 2.0.61363.0 on Windows Server 2003.  We've got one OCS sitting behind an Intel PING80LS media gateway.  We've got another OCS sitting behind a newer Dialogic PING80LS gateway.  Both gateways sit behind our Panasonic DBS576 PBX. 

    In order to eliminate the gateways and PBX from the equation, I just made test calls directly to both OCS systems using X-Lite.  The TTS behavior is consistent no matter if I use the debugging console, X-Lite, or a land-line phone.

     

    Friday, September 21, 2007 4:00 PM
  • I just wrote a VXML application that plays TTS using the prosody element and it works perfectly!  The prosody element only works on the first few words of the TTS strings in my SALT and Workflow applications.  I am at a loss to explain why it works in VXML.

     

    Thursday, September 27, 2007 9:10 PM
  • Hi,

     

    Just for info, when I first reported this issue on the BETA version, I too was writing a Workflow application so you may be on to something here.

     

    Thanks,

     

    Kev

    Monday, October 1, 2007 6:45 AM
  • We have finally made some progress on this issue; my sincere apologies for how long this has taken.  We have found that the key to this issue is whether or not a prompt database is included in the SSML.  For workflow applications, if there is a prompt database associated with the solution, the Statement activity automatically adds the prompt database.

     

    So, the simplest workaround is to remove the prompt database from the solution.  This is clearly not acceptible where the prompt database is needed for other prompts.  In this case, the workaround is to separate prompts into several Statement activities such that any portion of a prompt which requires use of prosody is in it's own Statement activity.  This Statement activity would be preceded by a Code activity which disables the prompt database (property PromptDatabase on the workflow), and followed by a Code activity which reactivates the prompt database.  So, using the original example in this thread, there would be the following activities:

     

    Statement(SPOKEN_BEFORE_TEXT)

    Code (Uri cachedPromptDatabase = PromptDatabase; PromptDatabase=null)

    Statement(REFERENCE)

    Code (PromptDatabase=cachedPromptDatabaseWink

    Statement(SPOKEN_AFTER_TEXT)

     

    [ Side note; the Turn_Starting event can be used instead of a preceding CodeActivity to disable the PromptDatabase. ]

     

    Also, some additional tips:

    • you may wish to use the SSML spell-out say-as tag for REFERENCE=ABC/1234
    • there are methods on PromptBuilder to avoid you writing SSML.

    Together, you could author your activity as:

     

    statementActivity1.MainPrompt.StartStyle(new PromptStyle(PromptRate.Slow));

    statementActivity1.MainPrompt.AppendTextWithHint("ABC/1234", SayAs.SpellOut);

    statementActivity1.MainPrompt.EndStyle();

    Monday, October 8, 2007 7:32 PM