Informetric theories and methods for exploring the Internet: an analytical survey of recent research literature

Library Trends, Wntr, 2002 by Judit Bar-Ilan, Bluma C. Peritz

Rosenbaum (1998) analyzed the content of the Web sites of twenty-four Web-based community networks in Indiana. The purpose of the study was to learn about the content and the structure of these sites.

Bar-Ilan (2000b) analyzed the content of Web pages containing the phrase "S&T indicators." Several facets were introduced, including the context in which the search phrase appeared, the type of document, the server, the domain, the geographical area, and the time period for which the indicators were computed. A rather interesting finding was the existence of a large number of Web pages with data from Malaysia. Since 1992, the Malaysian government has consistently published its Science and Technology reports on the Web.

Evaluation Using Existing/New Measures

Gordon & Pathak (1999) measured the retrieval effectiveness of Web search engines. Thirty-three members of the faculty at the University of Michigan Business School described to experienced searchers their information needs. The searchers presented appropriately phrased queries to eight search tools. The first twenty hits from each tool were retrieved and the 160 documents in some random order were presented to the faculty members, who judged the relevance of these documents. The absolute retrieval effectiveness was fairly low, and there were statistical differences in precision effectiveness.

A different approach was taken by Bar-Ilan (1999), who, instead of the subjective human relevance judgments, measured the technical precision of the retrieved documents. A document is technically relevant if it satisfies the query (i.e., all the search terms that are supposed to appear are actually present in the document, and all the terms that are not supposed to appear are missing). This is an objective measure, which can be computed simply, but it does not judge the quality of the document.

Oppenheim, Morris, McKnight, & Lowley (2000) gave an extensive review of the evaluation of Internet search engines. Precision was measured in most studies, but recall measuring is extremely difficult. Some suggested alternative methods were reviewed, and the authors recommended developing a standardized set of tools for search engine evaluation.

Page & Brin (1998) introduced a new method of measuring the quality of Web documents, called the PageRank. The method is based on the ideas of classical citation analysis, but instead of simply counting the number of links pointing to a document the quality of the page from which the link emanates is also taken into account. Similar ideas of weighing citations for classical citation analysis were introduced already in Pinski & Narin (1976). Egghe (2000) slightly disagrees with the analogy drawn between classical citations and hypertext links: Paper B citing paper A was necessarily written after paper A; however, this is not the case with Web pages, quite often there are reciprocal links between pages.

Henzinger, Heydon, Mitzenmacher, & Najork (1999) defined a new measure for search engines: "Search engine quality." The quality of a Web page is based on the links pointing to it. Some portion of the Web is crawled in order to estimate the "quality" of pages, and then the search engines are queried with a sample of the visited high quality pages to check if they index them.


 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with Thompson Gale