Technology Industry
Industry: Email Alert RSS FeedHalf The Equation: Open Standards Are A First Step Toward Speech Automation
Customer Inter@ction Solutions, Mar 2005 by Nguyen, Patrick
Today, well-engineered speech recognition systems achieve high customer satisfaction and high returns on investment in many customer service areas, including stock trading, flight information, catalog ordering and directory assistance. Although speech automation's potential has become widely recognized, few IT organizations have had the means to build or maintain speech systems, relying instead on expensive services from speech engine vendors or specialist system integrators. One major impediment to speech development efforts was removed when the industry adopted open standards and Web technologies familiar to mainstream IT organizations. However, a larger obstacle still remains: speech development methodologies and tools must improve to address the unique demands of voice user interfaces before mainstream enterprises can reliably deliver high quality speech systems at a reasonable cost.
Most RecentTechnology Articles
The First Step: Open Speech Standards
The earliest development approaches required programming in the application program interface (API) specific to each speech recognition engine. This approach burdened developers with lowlevel, recognition engine-specific details such as exception handling and resource management. Moreover, the proprietary nature of these APIs restricted the flexibility with which enterprises could deploy applications. Most software components had to be sourced from a single vendor and had to be deployed in a single location, and the resulting applications could not be easily ported to other platforms.
The advent of voice languages such as VoiceXML and SALT contributed to a Web-based development process. These languages allow a distribution of responsibilities in a speech system between a voice browser, which performs the speech recognition function, and a server application, which contains the application logic and user interface behavior (expressed in the voice language). As a result, application developers no longer concern themselves with speech engine API calls, but instead are responsible for generating documents that can be executed by the voice browser.
VoiceXML (Voice Extensible Markup Language) is a standard endorsed by the World Wide Web Consortium (W3C) for speech application development. The first specification was released in March 2000 by the VoiceXML Forum (www.voicexml.org/), an industry body that now has 375 member companies, including IBM, Nuance, Motorola and AT&T. The latest version, VoiceXML 2.0, became a W3C recommendation in March 2004. VoiceXML voice browsers are already available through dozens of vendors; in all, a hundred or so vendors provide compliant products. Commercial VoiceXML deployments have been estimated in the thousands.
SALT is a newer standard, proposed by the SALT Forum (www.saltforum.org/), and is somewhat competitive with VoiceXML. The intent of SALT is to facilitate multimodal applications, allowing spoken interfaces to be used in conjunction with a keyboard and a display screen, so that Web pages can be accessed by different client devices. However, SALT can also be used to build voice-only applications, and one of its targets is to simplify speech application development. The major proponent of SALT is Microsoft, but many companies support both SALT and VoiceXML, including Intel, Cisco, HP and ScanSoft. Only a few SALT voice browsers are currently available. The most prominent is Microsoft's Speech Server, which has attracted developer interest due to its integration with Microsoft's .NET framework. To date, SALT has few publicly announced commercial deployments.
VoiceXML is a larger language that contains its own procedural and transport elements. In contrast, SALT is a lightweight extension to existing markup languages, most notably HTML and XHTML. SALT tags are embedded within the HTML DOM (document object model) event and scripting environment, a model familiar to Web developers. Dialog flow is managed by combining SALT elements with DOM object properties, methods and events. This programming approach is well-suited to multimodal applications because visual and speech elements on a Web document are peers. VoiceXML, on the other hand, has constructs designed specifically for speech-only interfaces, such as dialogs with predefined execution flows.
Despite the competition, SALT supports various W3C standards associated with the VoiceXML standard, including SRGS, the W3C speech recognition grammar specification; SSML, the W3C language for controlling TTS (text-tospeech) pronunciation, emphasis and intonation; and ECMAScript, the scripting language specification. Moreover, SALT has been submitted to the W3C's Voice Browser working group, and some of its concepts may be incorporated into the next VoiceXML standard.
VoiceXML and SALT are both presentation layer languages that deliver a number of benefits. First, they are associated with a Web development model familiar to most programmers. second, they support flexible deployment architectures - the voice browser and server application can be colocated or separated, and can be managed by the same or different entities. Third, they offer the prospect of application portability across different vendor platforms.
CIO SessionsVision Series on ZDNet
Brought to you by CBS MoneyWatch.com
- 10 Best Places to Retire
- Companies with the Best 401(k) Plans
- Most Important Document for Your Heirs? It's Not Your Will
- Video: Should You Expect to Retire Rich?
- Over 50? Here's How to Get (and Keep) a Great Job
Most Recent Business Articles
- Your feedback
- Why fly solo when an executive assistant can accelerate your CLNC® business?
- The CLNC® mentors held the key to my first case and to my CLNC® success
- Atlanta CLNC® 6-day certification seminar photo galleryplus sign up today for spring 2009 to save $100.00
- Announcing the 2009 NACLNC® conference keynote speaker, Stedman Graham: move like a maverick for breakaway CLNC® success at the 2009 NACLNC® conference
Most Recent Business Publications
Most Popular Business Articles
- Using object-oriented analysis and design over traditional structured analysis and design
- Big Fish Games Migrates Upstream to Fisher Plaza; High Growth Online Gaming Firm Vaults Fisher Plaza Occupancy Rate Above 90%
- Top of the line: some of the world's most well-respected doctors practice in South Florida. A guide to choosing the best physician specialists - Top Doctors in South Florida
- BEHR Paints Introduces a Colorful New Way to Paint and Prime All in One with BEHR Premium Plus Ultra™ Interior
- Sand filter basics: high-rate sand filters can be confusing for those new to the business. Understanding valve modes is the key


