Find Articles in:
All
Business
Reference
Technology
News
Lifestyle

Half The Equation: Open Standards Are A First Step Toward Speech Automation

Customer Inter@ction Solutions, Mar 2005 by Nguyen, Patrick

Today, well-engineered speech recognition systems achieve high customer satisfaction and high returns on investment in many customer service areas, including stock trading, flight information, catalog ordering and directory assistance. Although speech automation's potential has become widely recognized, few IT organizations have had the means to build or maintain speech systems, relying instead on expensive services from speech engine vendors or specialist system integrators. One major impediment to speech development efforts was removed when the industry adopted open standards and Web technologies familiar to mainstream IT organizations. However, a larger obstacle still remains: speech development methodologies and tools must improve to address the unique demands of voice user interfaces before mainstream enterprises can reliably deliver high quality speech systems at a reasonable cost.

The First Step: Open Speech Standards

The earliest development approaches required programming in the application program interface (API) specific to each speech recognition engine. This approach burdened developers with lowlevel, recognition engine-specific details such as exception handling and resource management. Moreover, the proprietary nature of these APIs restricted the flexibility with which enterprises could deploy applications. Most software components had to be sourced from a single vendor and had to be deployed in a single location, and the resulting applications could not be easily ported to other platforms.

The advent of voice languages such as VoiceXML and SALT contributed to a Web-based development process. These languages allow a distribution of responsibilities in a speech system between a voice browser, which performs the speech recognition function, and a server application, which contains the application logic and user interface behavior (expressed in the voice language). As a result, application developers no longer concern themselves with speech engine API calls, but instead are responsible for generating documents that can be executed by the voice browser.

VoiceXML (Voice Extensible Markup Language) is a standard endorsed by the World Wide Web Consortium (W3C) for speech application development. The first specification was released in March 2000 by the VoiceXML Forum (www.voicexml.org/), an industry body that now has 375 member companies, including IBM, Nuance, Motorola and AT&T. The latest version, VoiceXML 2.0, became a W3C recommendation in March 2004. VoiceXML voice browsers are already available through dozens of vendors; in all, a hundred or so vendors provide compliant products. Commercial VoiceXML deployments have been estimated in the thousands.

SALT is a newer standard, proposed by the SALT Forum (www.saltforum.org/), and is somewhat competitive with VoiceXML. The intent of SALT is to facilitate multimodal applications, allowing spoken interfaces to be used in conjunction with a keyboard and a display screen, so that Web pages can be accessed by different client devices. However, SALT can also be used to build voice-only applications, and one of its targets is to simplify speech application development. The major proponent of SALT is Microsoft, but many companies support both SALT and VoiceXML, including Intel, Cisco, HP and ScanSoft. Only a few SALT voice browsers are currently available. The most prominent is Microsoft's Speech Server, which has attracted developer interest due to its integration with Microsoft's .NET framework. To date, SALT has few publicly announced commercial deployments.

VoiceXML is a larger language that contains its own procedural and transport elements. In contrast, SALT is a lightweight extension to existing markup languages, most notably HTML and XHTML. SALT tags are embedded within the HTML DOM (document object model) event and scripting environment, a model familiar to Web developers. Dialog flow is managed by combining SALT elements with DOM object properties, methods and events. This programming approach is well-suited to multimodal applications because visual and speech elements on a Web document are peers. VoiceXML, on the other hand, has constructs designed specifically for speech-only interfaces, such as dialogs with predefined execution flows.

Despite the competition, SALT supports various W3C standards associated with the VoiceXML standard, including SRGS, the W3C speech recognition grammar specification; SSML, the W3C language for controlling TTS (text-tospeech) pronunciation, emphasis and intonation; and ECMAScript, the scripting language specification. Moreover, SALT has been submitted to the W3C's Voice Browser working group, and some of its concepts may be incorporated into the next VoiceXML standard.

VoiceXML and SALT are both presentation layer languages that deliver a number of benefits. First, they are associated with a Web development model familiar to most programmers. second, they support flexible deployment architectures - the voice browser and server application can be colocated or separated, and can be managed by the same or different entities. Third, they offer the prospect of application portability across different vendor platforms.

 

BNET TalkbackShare your ideas and expertise on this topic

The following tags are supported in BNET comments:
<b></b> <i></i> <u></u> <pre></pre>

Leave a Reply

  1. You are currently a guest | Login?
CIO SessionsVision Series on ZDNet

See and hear what CIOs the world over thinks about the business of technology and how it's changing the way we live and work.

Go
advertisement
  • Click Here
  • Click Here
advertisement