Voice recognition technology for persons who have motoric disabilities

Journal of Rehabilitation, April-June, 1994 by Tanya Goette, Jack T. Marchewka

Connected-word systems, on the other hand, allow for short utterances of multiple words but definite beginning and ending points are required. This type of VRT system allows the individual to say several words before pausing. Since connected-word systems follow natural speech patterns more closely, complex matching algorithms are required and the level of accuracy is reduced.

The most complex voice recognition systems allow for true continuous speech. The ultimate goal of VRT is to have a computer that recognizes speech similar to the way humans recognize speech. Although products to support true continuous speech are still being developed, products do exist that allow for script read input and one sentence utterance (Clements, 1987).

VRT systems also vary in terms of vocabulary size. Smaller vocabularies are easier to train, while larger vocabulary systems require highly complex recognition algorithms. Moreover, if connected-word systems use large vocabularies, the possible word orderings increase exponentially and require more sophisticated hardware (Clements, 1987).

In addition, VRT systems are either speaker-dependent or speaker-independent. Speaker-dependent systems require an individual to train a vocabulary or list of specific words that are used by the system. In short, the system must be told who is using the system. With large vocabularies, speaker training time becomes a problem since a tradeoff exists between the amount of word training required and the accuracy of recognition. On the other hand, individuals with speech impediments can train speaker-dependent systems to understand their pronunciations.

Speaker-independent systems have models for word recognition built into the system. Training each word in the vocabulary is not required, but a large degree of accuracy may be lost. Phone companies are experimenting with VRT applications that are speaker-independent and only allow for a limited vocabulary.

Speaker-adaptive systems provide a compromise between speaker-independent and speaker-dependent systems. Adaptive systems include basic word models. An individual trains only the certain words (or utterances) that comprise the necessary syllables to make up other words. The system adapts the original models to the speaker's voice as the speaker uses the system.

Systems vary in the amount of background noise filtered. With some VRT systems problems arise if there are even two or three people talking in the background while someone is using the system. Machinery noises also can affect the recognition accuracy of VRT systems.

Systems also have differing degrees of robustness, which is the ability of the system to recognize variances in speech. The performance of the system may decrease when the voice becomes fatigued or when emotion is reflected in the voice. For example, the system may have a lower accuracy rate for recognizing words when the user shouts, speaks angrily, delays utterances, or has a cold.

Experiences with VRT

Parts of this paper were written and edited using Dragon Systems, Inc.'s product called DragonDictate [TM]. The system may be used with most microcomputer software and has a 30,000 word vocabulary. The system is speaker-adaptive and requires some training for each individual user.


 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
Click Here
advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with Thompson Gale