Featured White Papers
- Oct. 14th: Simplified IT with Software-as-a-Service (SaaS) (ZDNet)
- PCI DSS therapy for the smaller retailer (McAfee)
- The rise of Web commuting (Citrix Online)
A survey of metadata research for organizing the web
Library Trends, Fall, 2003 by Jane L. Hunter
* Metadata Harvesting--the Open Archives Initiative (OAI) provides a protocol for data providers to make their metadata and content accessible--enabling value-added search and retrieval services to be built on top of harvested metadata.
* Multimedia metadata--there will be a further move away from textual resources to new multimedia formats that support better quality and higher compression ratios, e.g., images (JPEG-2000), video (MPEG-4), audio (MP3), 3D (VRML, Web3D), multimedia (SMIL, Shockwave Flash), and interactive digital objects. All of these new media types will require complex fine-grained metadata, extracted automatically where possible.
* Rights metadata--new emerging standards such as MPEG-21 and XrML are designed to enable automated copyright management and services.
* Automatic metadata extraction--technologies to enable the automatic classification and segmentation of digital resources. In particular, automatic image processing, speech recognition, and video-segmentation tools will enable content-based querying and retrieval of audiovisual content.
* Search engines:
** Smarter agent-based search engines;
** Federated search engines;
** Peer-to-peer search engines;
** Multimedia search engines;
** Multilingual search engines;
** New search interfaces--search interfaces that present results graphically;
** Automatic/dynamic aggregation and generation of search results into hypermedia and multimedia presentations.
* Personalization/customization--autonomous agents that push relevant information to the user based on user preferences that may be personally configured or learned by the system.
* Broadband networks--multigigabit-capable networks for high-quality video-conferencing and visualization applications:
** Grid computing--distributed computing and communications infrastructures for data intensive computing applications;
** The Semantic Grid--the combination of semantic Web technologies with grid computing to provide large scale data access and integration to the e-Science community.
* Mobile and wireless technologies--delivery of information to mobile devices or appliances based on users' current context or location.
* Authentication--technologies to ensure trust and record the provenance of metadata.
* Annotation systems--enable users to attach their own subjective notes, opinions, and views to resources for others to access and read.
* Preservation metadata--metadata to support long-term preservation strategies for all types of digital resources.
2.1 XML Technologies and Metadata
XML and its associated technologies--XML Namespaces, XML Query languages, and XML Databases--are enabling implementers to develop metadata schemas, application profiles, large repositories of XML metadata, and search interfaces using XML Query Language. These technologies are key to enabling the automated computer-processing, integration, and exchange of information over the Internet.
2.1.1 Extensible Markup Language (XML). XML (W3C XML, 2003) is a simple, very flexible text format derived from SGML (ISO 8879). Originally designed to meet the challenges of large-scale electronic publishing, XML is playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere. Because XML makes it possible to exchange data in a standard format, independent of storage, it has become the de facto standard for representing metadata descriptions of resources on the Internet.