The World Wide Web and emerging Internet resource discovery standards for scholarly literature - Networked Scholarly Publishing

Library Trends, Spring, 1995 by Stuart L. Weibel

Retaining SGML Structure for Indexing

The original SGML versions of the documents are used to build an inverted-file database, which is used to search the collection and generate pointers to the HTML version of the document. It is important to note that the original SGML markup is retained in this database, and this markup supports searching in specific fields (for example, limiting the search term to occurrences in the title or abstract). Thus, although the delivery of scholarly journals into the WWW involves some display formatting compromises, it need not result in loss of structured document searching capabilities.

Web Browsers as Remotely Programmable User Interfaces

The World Wide Web is a prominent example of client-server computing. A client is a program that issues a request for service to another, largely independent, piece of software that may reside on the same machine or, more often, on another machine. The two are linked to the degree that they share a communications protocol - a formally specified language for communicating with each other. The protocol in the Web that supports this client-server communication is called the HyperText Transport Protocol (HTTP).

An HTTP server is a fairly simple piece of software that "listens" at an Internet port for requests issued by a Web browser. In its simplest terms, this request is a string of characters known as a Uniform Resource Locator (URL) that specifies a scheme (ftp, Gopher, or http, for example), a host name (a machine on whose file system the resource resides), and finally, what is interpreted as a path to the location of a file on the host machine (this is not strictly the case but illustrates the basic workings of the protocol).

The server, having received a request for a document under its control, sends that file (typically HTML-encoded text) to the client that issued the request. Links may be embedded in the text, allowing users to jump to a different part of the document or another document entirely. In effect, these links become navigational controls embedded in the document, allowing the information provider to program the user interface to a limited degree.

HTML Forms: Getting Information from the User

A capability known as HTML Forms allows the document provider to interact with the user by putting up a template of text entry fields and several varieties of check boxes. The content and structure of the forms can be tailored to the specific task at hand, in effect a remote programming of the Web browser's capability that is as easily modified as any HTML file. Thus the user has what appoaches a universal client application - familiar in its appearance and behavior but adaptable to a wide variety of search and retrieval situations.

From the provider's point of view, changes in search capabilities no longer require redistributing software to an entire customer base but rather a relatively straightforward change in a data file.

In the future, it is likely that the HTML standard will support a persistent toolbar capability (not unlike the toolbars currently found in Microsoft Windows and Apple Macintosh application software). When this is available, controls will not scroll away as the user moves through a document as they do now.


 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
advertisement
  • Click Here
  • Click Here
  • Click Here
  • Click Here
advertisement
Click Here

Content provided in partnership with Thompson Gale