Have you heard the news about speech recognition and the hand-held Web? - Technology Information

Communications News, Feb, 2001 by Klaus Schleicher

One day soon, your voice will be your universal access device.

Hand-held devices for Internet access are stylish, portable and, of course, small. The problem is, users want them to do many of the same tasks PCs can do, such as provide access to e-mail and the Internet. The smaller they get--with restricted keypads and truncated LCD displays--the more awkward they are to operate. Many say speech-recognition and text-to-speech technology will solve the problem of providing a usable interface for these devices without a keyboard or mouse.

Inherent limitations have not stopped the devices from becoming wildly popular. Some seven million users logged on to the Internet with wireless devices in 1999, notes research compiled by Claudine Zap Freidberg. According to International Data Corp., 61.5 million users are expected to have use of wireless devices with Internet services in 2003.

Speech technology would seem an ideal complement to these Web appliances. Advances in speech processing have already yielded for users a far more natural-sounding voice reading back e-mails and Web information, including whole Web pages. Real advances are just over the horizon.

Speech recognition for hand-held Internet access, when combined with intelligent software assistants and intelligent content management, will truly come into its own. These emerging software tools can narrow Internet search paths to a fine point and trim intermediary steps from tasks, such as online shopping, by simplifying multistep processes into a brief series of voice commands.

SOUNDS LIKE PROGRESS

Some new hand-held devices already use speech-recognition technology to integrate e-mail, telephone, pagers and fax machines with a speech-user interface, rather than a graphical or keypad-based interface. Next-generation "smart-phone" applications and related wireless services are now appearing that provide access to e-mail, Internet and corporate intranet information from a cellular handset.

Additionally, developers are enhancing automatic speech-recognition and text-to-speech technology with natural language understanding (NLU). NLU is a next-generation speech technology that enables applications not only to recognize words but also to understand them contextually, and read them back in a pleasant voice. This advance also represents the first step toward true natural-language dialogue systems that enable two-way conversations with computers--essential for future Web searching with speech-enabled hand-held devices.

New applications combine improved text-to-speech (TTS) and automatic speech-recognition (ASR) engines with natural-language processing. ASR converts the user's speech to a text sentence of distinct words. TTS converts a text sentence into computer-generated speech. Mediating between them, NLU enables the computer to understand what the user is saying. The combination of these technologies makes it possible for applications to interact with humans through spoken text, eliminating the need for prerecorded voice files or manual input devices.

As processors and memory have continued to grow in capacity and drop in price, developers have used larger voice segments that make it easier to develop more natural-sounding speech. At the same time, developers have broken new ground in the ability to join these voice segments effectively to create a smoother, more natural-sounding synthetic voice. Gone is the robotic-sounding synthesized voice produced by formant synthesis in the 1980s and 1990s.

In addition, the newest synthesizers, combined with new ASR technology, enable the computer to generate any question necessary to clarify spoken input. Boosted by the advances in TTS voice quality, developers are turning their attention to creating server-side natural-language dialogue systems that combine TTS with natural-language ASR for use with any wireless client device.

A "concept" hand-held device debuted at Demo 2000 in Indian Wells, CA. The fully speech-enabled handset let users listen to e-mail summaries, as well as full-text e-mails, issue natural-language commands such as "next message" or "send e-mail," and dictate e-mails. A user with a wireless Web connection can use such a device to make simple Web queries--"What is the weather in Fresno?"--buy something online, or trade stock on E*TRADE or Charles Schwab.com.

Mobile phone users do not need to surf the Web--they need fast answers to specific queries. Speech-recognition and text-to-speech engines residing remotely on servers enable users to ask these questions, and hear instant answers in a pleasant, human-sounding voice.

With a speech-equipped wireless Web device, rather than stopping to use a stylus, or hunt through the letters/numbers on a mobile phone's keypad, users gain instant access to specific Web content by simply speaking into the phone and asking for it, making using wireless Web-enabled mobile phones and small hand-held computers simple.

EASIER LISTENING

The achievement of a truly natural-sounding human voice is already making current TTS and speech-recognition applications much more compelling. The future of the voice interface, however, hinges on the computer's ability to interact with users conversationally, like a human would. A person reading aloud can appreciate tone and meaning, and express humor, irony or the contextual meaning of a narrative's elements.


 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
CXO UnpluggedSmart Business interviews on BNET

See and hear how senior level executives across the Asia Pacific are developing smart business ideas across a variety of sectors. The focus is on the future, and on how businesses need to evolve.

advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with Thompson Gale