Precise and Efficient Retrieval of Captioned Images: The MARIE Project
Library Trends, Fall, 1999 by Neil C. Rowe
ABSTRACT
THE MARIE PROJECT HAS EXPLORED knowledge-based information retrieval of captioned images of the kind found in picture libraries and on the Internet. It exploits the idea that images are easier to understand with context, especially descriptive text near them, but it also does image analysis. The MARIE approach has five parts: (1) find the images and captions; (2) parse and interpret the captions; (3) segment the images into regions of homogeneous characteristics and classify them; (4) correlate caption interpretation with image interpretation using the idea of focus; and (5) optimize query execution at run time. MARIE emphasizes domain-independent methods for portability at the expense of some performance, although some domain specification is still required. Experiments show MARIE prototypes are more accurate than simpler methods, although the task is very challenging and more work is needed. Its processing is illustrated in detail on part of art Internet World Wide Web page.
INTRODUCTION
Multimedia data are increasingly important information resources for computers and networks. Much of the excitement over the World Wide Web is about its multimedia capabilities. Images of various kinds are its most common nontextual data. But finding the images relevant to some user need is often much harder than finding text for a need. Careful content analysis of unrestricted images is slow and prone to errors. It helps to find captions or descriptions as many images have them.
Nonetheless, multimedia information retrieval is still difficult. Many problems must be solved just to find caption information in the hope of finding related images. Only a 1 percent success rate was obtained in experiments trying to retrieve photographs depicting single keywords like "moon" and "hill" using the AltaVista search engine on the World Wide Web (Rowe & Frew, 1998). This is probably because most text on Web pages, and even on pages with well-captioned photographs, was irrelevant to the photographs, and the words searched for had many senses. To improve this performance several things are needed:
* a theory of where captions are likely to be on pages;
* a theory of which images are likely to have described content;
* language-understanding software to ascertain the correct word senses and their interrelationships in the caption candidates;
* image-understanding software to obtain features not likely to be in the caption;
* a theory connecting caption concepts to image regions; and
* efficient methods of retrieval of relevant images in response to user queries or requests.
Note that speed is only critical in the last phase; while efficient methods are always desirable, accuracy is more a concern because it is so low for keyword-based retrieval. The time to do careful language and image processing can be justified during the indexing of a database if it can significantly improve later retrieval accuracy.
Consider Figure 1, which shows part of a U.S. Army Web page. Much text is scattered about, but not all of it refers to the pictures. The two formal captions (in italics, but not otherwise identified) are inconsistently placed with respect to their photographs. But the title "Gunnery at Udairi" is a caption too. Next, note that many of the words and phrases in these candidate captions do not describe the pictures. Neither picture shows "U.S. Army Central Command," "power generator equipment," an "Iraqi," "the Gulf War," or "Fort Hood"; matching would falsely retrieve this page for any of these key phrases. Similarly, the words "commander," "senior," "signal," "fire," "target," and "live" are all used in special senses, so this page would be falsely retrieved for queries intending to refer to their most common senses. The only way to eliminate such errors is to parse and interpret caption candidates using detailed linguistic knowledge. Finally, note that many things seen in both photographs are not mentioned in their captions. Only a small part of the left photograph area is devoted to the people, and the right photograph displays many features of the tanks not mentioned in its caption. Thus there are many challenges in indexing the multimedia data on these pages.
[Figure 1 ILLUSTRATION OMITTED]
Noteworthy current systems for image retrieval are QBIC (Flickner et al., 1995), Virage (Virage Inc., San Mateo, California, USA), and VisualSEEK (Smith & Chang, ]996), which exploit simple-to-compute image properties like average color and texture. The user specifies color and texture patches, or perhaps an entire image, which is then compared to the images in the database to find the best matches. But these systems strongly emphasize visual properties and can confuse very different things of accidentally similar appearance, like seeing a face in an aerial photograph. So these systems would not help for a typical Web page like Figure 1 since color similarity to the images there would not mean much. Another category of current image-retrieval systems like Chabot (Ogle, 1995) primarily exploits descriptive information about the image, but all this information must be entered manually for each image by someone knowledgeable about it, which requires a considerable amount of tedious work.
- 5 Rules for Immediate Annuities
- Death in the Family: 12 Things to Do Now
- Dumbest Things You Do With Your Money
- 6 Online Networking Mistakes to Avoid
- 401(k) Mistakes to Avoid
- 5 Economic Scenarios to Keep You Up at Night
- The Real ‘Best Places to Retire’
- Best Credit Cards for You
- 12 Tough Questions to Ask Your Parents
- The Real ‘Best Colleges’
- Home Buyer Tax Credit: How to Cash In
- Why You Shouldn't Bash Cash
- 8 Phony 'Bargains' and Better Alternatives
- Danger: 3 Debit Card Scams to Avoid
- 6 Myths About Gas Mileage
- 29 Fees We Hate Most
- Quick and Easy Ways to Boost Returns
- Best Stocks to Buy Now
- Lower Your Taxes: 10 Moves to Make Now
- New Jobs: 8 Lessons from Real-Life Career Switchers
- The New Job Market: Who Wins and Who Loses?
- Health Care Reform's Public Option: Everything You Need to Know
- Volunteer Work When Unemployed: Should You Work for Free?
- Whose Recovery Is This?
- Long-Term-Care Insurance: 4 Biggest Risks to Avoid
Content provided in partnership with
Most Recent Reference Articles
- A Maryland state trooper gave Erik Bonstrom an $80 ticket for driving too slowly
- In California, postal worker Dean Hudson has been found guilty
- Alec Loorz, the 15-year-old founder of Kids vs. Global Warming and recent Brower Youth Award recipient, went to Congress in November for a press conference with Senators Barbara Boxer and John Kerry, who are championing legislation to stabilize US greenho
- Foreign exchange
- The buzz on bees
Most Recent Reference Publications
Most Popular Reference Articles
- 9 questions to ask your new lover: what you were afraid to ask, but always wanted to know
- A world without nuclear weapons?
- How Tyler Perry rose from homelessness to a $5 million mansion
- Credit card debt on college campuses: causes, consequences, and solutions
- Rejoice anyway - Zephaniah 3:14-20, Philippians 4:4-7 - Living by the Word - Column



