Next: Information Filtering Up: Information Retrieval and Previous: Information Retrieval and

Information Retrieval

Three main retrieval paradigms can be identified [30] in the IR literature: (i) statistical (ii) semantic and (iii) contextual/structural. The first approach emphasizes statistical correlations of word counts in documents and document collections. Salton [39][38] describes the use of statistical schemes such as probabilistic and vector space models for document representation and retrieval. The Smart system [42] is an example of a text processing and retrieval system based on the vector processing model. Latent Semantic Indexing (LSI) [5] is another example of a statistical method to capture the term associations in documents. The semantic approach to retrieval characterizes the documents and queries so as to represent the underlying meaning [36][17]. It emphasizes natural language processing or the use of AI-like frames. The third approach, also known as ``smart'' Boolean, takes advantage of the structural and contextual information typically available in retrieval systems. For example, this could involve the use of thesauri in which relationships among terms are encoded [12] or take advantage of context and structure generally available from the document terms [30]. CONIT [31] is an example of a system built in the ``smart'' Boolean framework.

The Internet is one of the largest publicly available ``databases'' of documents (among other things) and is a good testing ground for most retrieval techniques. With the Internet having seen an explosive growth in recent years, a number of services have arisen on the Internet to help users search and retrieve documents from servers around the world - WAIS, Gopher and World Wide Web to name a few. Wide Area Information Servers (WAIS) [21] is a networked based document indexing and retrieval system for textual data. The servers maintain inverted indexes of keywords that are used for efficient retrieval of documents. WAIS allows users to provide relevance feedback to further specialize an initial query. Gopher [32] is primarily a tool for browsing through hierarchically organized documents, but it also allows to search for information using full-text indexes. In the World Wide Web (WWW) [3], the information is organized using the hypertext paradigm where users can explore information by selecting hypertext links to other information. Documents also contain indexes which the user can search for.

A number of commercial retrieval systems are available in the market. Lexis/Nexis and WestLaw are well known information services which contain legal and other information. Both services retrieve documents in response to boolean queries from users. Hoover, marketed by Sandpoint, is an agent that acts as an intelligent librarian which knows how and where to look for information. Applesearch, released by Apple, is quite similar to Hoover.



Next: Information Filtering Up: Information Retrieval and Previous: Information Retrieval and


MIT Media Lab - Autonomous Agents Group - agentmaster@media.mit.edu