Vaulting the language barrier: computers are helping to search texts and data now shrouded in linguistic differences - language translation software

Science News, March 8, 1997 by Janet Raloff

Marjorie Hlava can't read Russian, but that doesn't stop her from learning the contents of a document printed in the Cyrillic alphabet. She simply places each page under the cover of the flatbed scanner in her Albuquerque office, presses a button, and waits as her computer displays an English-language version.

Using only English, she can also search Russian databases, such as files of published scientific reports. She types in the key words or phrases that describe her interests, then lets a series of computer programs take over. After converting her request into Russian, they sift through data files for references to documents that seem to match, convert those matches back into English, and display them on her computer.

More than once she has even conversed via her laptop computer-on a plane, for instance-with Russians who know no English. She types her side of the dialogue in English, which the computer converts into a Russian display. The other party types his or her responses in Russian, which the computer translates for Hlava. They can chat for hours that way, provided they restrict their words and phrases to those in the thesauruses, or set lists of words, on her machine.

That isn't too hard, Hlava notes, since the Russian-to-English portion currently contains some 750,000 words and phrases and the English-to-Russian one nearly 600,000.

Most of the software programs that allow fairly inexpensive, off-the-shelf computer hardware to translate Russian are preliminary versions being developed by Gerold G. Belonogov and Boris A. Kuznetsov at VINITI, the All-Russian Institute for Scientific and Technical Information in Moscow. Hlava's company, Access Innovations, helped channel some U.S. government financing into the creation of those systems.

As the Internet has been demonstrating over the past few years, "we now have access to an enormous amount of information that didn't used to be available," notes Douglas W. Oard of the University of Maryland in College Park. "But it's only accessible to those who speak the language. And as the World Wide Web's name indicates, not everything on the Internet is in English."

Because users seldom pay for data they find on the Web, there is little incentive for those who post the information to invest in expensive, time-consuming multilanguage translations or indexing. What a user needs to make full and efficient use of a foreign database or the Internet, Oard explains, is a system that translates among languages, searches effectively for answers to a user's query or stated interests, and then ranks any matches by the likelihood of their satisfying a particular user's needs.

For many persons interested in focused areas of science or engineering-such as the microwave heating of plasmas or drugs to treat cancer patients-"the things that Marjorie Hlava [and her VINITI colleagues] do are just as good as you would like," observes Oard. "The limitation is that humans can find them difficult to use"; that is, they need to be trained in effective search strategies.

He and a host of others are working to make foreign data and files easily accessible to an even broader audience, one with little training in data searches. Unfortunately, he says, "we're only about half as good as you'd like at doing this. And getting halfway turns out to have been rather easy." It's the second half that will prove costly in both time and money, he maintains.

The payoff could prove substantial, he and Hlava agree. Such efforts could uncloak a world of research and data for people who don't speak a foreign language.

Today, computer technologies are being developed to translate a wide range of mother tongues. At the behest of the European Parliament, for instance, several ambitious programs are working to make documents prepared in English or French intelligible to those who read any of the other nine official languages of the European Union. Even more challenging projects around the world seek to pair English with languages written in non-Roman characters-such as Japanese, Chinese, Greek, Arabic, Russian, Korean, and Vietnamese.

Few of these efforts are designed to provide full machine translation of the documents; rather, their aim is a more limited rendering of some important aspects-such as titles, key words, or abstracts.

Indeed, this may be sufficient if the goal is merely to identify a few particularly valuable documents that a user might then choose to have translated in full, Oard observes. The projects could also help electronic browsers identify more circumscribed information, such as images posted on the Internet with captions in a foreign language, names and affiliations of foreign scientists who have conducted research on a topic of interest, or newly coined foreign terms or short quotations in a text.

Even limited cross-language identification and retrieval of electronically stored text represents a tall order, Oard notes.

For instance, even within a single language, commercial database searching remains a fairly unscientific, "seat-of-the-pants thing," observes Richard S. Marcus, an information scientist at the Massachusetts Institute of Technology. What's not well recognized, he says, is that unless someone is an expert in searching or has the services of a good librarian, "you typically are able to retrieve only about 5 percent of the relevant documents available."

 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
CXO UnpluggedSmart Business interviews on BNET

See and hear how senior level executives across the Asia Pacific are developing smart business ideas across a variety of sectors. The focus is on the future, and on how businesses need to evolve.

advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with Thompson Gale