Beyond SALT versus VoiceXML: Coping with the wealth of standards in speech and multimodal self-service applications

Customer Inter@ction Solutions, Mar 2003 by Scholz, K W

CALL CENTER/CRM MANAGEMENT SCOPE

Standards serve as the foundation for growth within an industry. As a new technology is spawned and begins to pique the interest of developers and consumers, its initial growth is typically haphazard and devoid of structure. As the technology reaches adolescence, however, its leaders develop standards that guide growth and interoperability, and its haphazard evolution fades. As the technologies enabling speech and multimodal self-service applications mature, many standards have emerged and combined to enable the field to approach mainstream status. The growth of standards is not without its cost, however; because of the complexity of the underlying technologies, the standards documents themselves have grown to span thousands of pages, and as a consequence constitute an overwhelming obstacle to a developer's mastery of the technology.

Furthermore, this past year has seen considerable press devoted to the so-called"conflict" between the two key standards in our industry: SALT and VoiceXML (VXML). Claims of conflict have deluded some developers into feeling pressure to make premature "choices" between them, while intimidating others into inactivity as they wait for the industry to choose the "right" one. In fact, there are over a dozen distinct standards designed to guide the development and execution of speech and multimodal applications, occasionally competing with one another but more frequently operating in harmony to guide distinct components of the application's architecture.

Deployment Architecture

Figure 1 (on page 54) illustrates the deployment architecture for a speech or multimodal application. The major components in the architecture and their functions are as follows:

Application Server. The central component is the application server, the platform and software responsible for managing the execution of the application. The application server's principal responsibilities include management of the dialog with the end-user and management of the business transaction processor, the application's business functionality.

Business transaction processor. This term describes the software and (optionally) the platform responsible for execution of the business transactions (for example, a travel reservation system, a retail banking database, a regional or national weather repository, or a securities transaction database, to name a few).

Voice gateway. During execution, the application server interchanges information with the voice gateway that is coded in a markup language and is conveyed using the familiar Internet delivery paradigm. The voice gateway includes:

* A markup language interpreter,

* An automatic speech recognizer (ASR),

* A text-to-speech (TTS) generator, and

* A telephone network interface (tele interface). The tele interface mediates the connection through the circuit-switched or packet-switched telephone network to the end user. The network connection will use either a direct digital interface to the circuit-switched network or voiceover-IP (VoIP) through a media gateway to the telephone network.

Voice user interface. This is an end user interface using speech over wireless or wireline telephones.

Graphics user intel This is an end user interface using desktop PCs, PDAs, cell phones with digital visual displays, or other screen-oriented devices.

Standards

The principal standards and standardized APIs (application program interfaces) that guide the operation and interaction of the components in the architecture are shown in Figure 1, and are listed and described below. The agency responsible for each standard or API is shown in parentheses after the standard's name.

CCXML (W3C). Call Control eXtensible Markup Language is designed to provide telephony call control support for dialog systems. CCXML is intended to serve as an adjunct language for use with a VXML, SALT or other dialog implementation platform.

HTTP (IETF). Hypertext Transfer Protocol is an application-level protocol for distributed, collaborative, hypermedia information systems. It is a generic, stateless protocol which can be used for many tasks beyond its use for hypertext, such as name servers and distributed object management systems, through extension of its request methods, error codes and headers.

H.323 (ITU). H.323 is a standard that specifies the components, protocols and procedures that provide multimedia communication services - real-time audio, video and data communications - over packet networks, including Internet protocol (IP)-based networks. H.323 is part of a family of recommendations that provide multimedia communication services over a variety of networks.

JDBC (Son Microsystem). Java Database Connectivity is an API that lets developers access virtually any tabular data source from the Java programming language. It provides cross-DBMS connectivity to a wide range of SQL databases and, with the JDBC API, it also provides access to other tabular data sources, such as spreadsheets or flat files.

ODBC (Microsoft). Online Database Connectivity is a widely accepted API for database access. It is based on the CallLevel Interface (CLI) specifications from X/Open and ISO/IEC for database APIs and uses Structured Query Language (SQL) as its database access language.

 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
CXO UnpluggedSmart Business interviews on BNET

See and hear how senior level executives across the Asia Pacific are developing smart business ideas across a variety of sectors. The focus is on the future, and on how businesses need to evolve.

advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement
Click Here

Content provided in partnership with ProQuest