none
poor quality of voice recognition within a wav file RRS feed

  • General discussion

  • I'm trying to get text out of (for now) wav files.  The code "works" - however, the text extracted is of very poor quality, regardless of the wav file used (beyond a simple 2-3 sec sample). See code below. I am using the dictationGrammar - which seems most appropriate. The system is "untrained" as the wav files' origin will be random (voice mail messages for example). 

    Is there something I should be doing in particular or is this just how it is (i.e. useless like screen doors on a submarine)?

     

     

     System.Speech.Recognition.DictationGrammar dictationGrammer = new DictationGrammar();

     

                using (SpeechRecognitionEngine engine = new SpeechRecognitionEngine())

                {

     

                    engine.LoadGrammar(dictationGrammer);

                    using (Stream file = new FileStream(this.txtFilePath.Text, FileMode.Open, FileAccess.Read))

                    {

                        engine.SetInputToWaveStream(file);

     

                        try

                        {

                            RecognitionResult result = engine.Recognize();

                            if (result != null)

                            {

                                this.txtOutput.Text = result.Text;

     

                            }

                            else

                            {

     

                                this.txtOutput.Text = "Returned NULL";

                            }

                        }

                        catch (Exception ex)

                        {

     

                            ex.GetHashCode();

     

                        }                   

     

                    }

     

     

     

               }

    Thursday, September 23, 2010 2:07 PM

All replies

  • I'm afraid it is a bit like screen doors. Dictation is extremely difficult with an untrained system. Untrained speech only works well when it has an associated grammar. And even then, the larger the grammar the worse the accuracy. Google Voice is doing transcription on voice mail and it has become something of a joke how poor the transcription is, sometimes getting no more than 5% of it correct.


    - Marc LaFleur
    Thursday, October 14, 2010 4:45 AM