Next: Future Work Up: No Title Previous: Results

Conclusions

A personalized information filtering system must be able to specialize to user interests, adapt as they change and explore the domain for potentially relevant information. The implementation of the personalized information filtering agents demonstrates that Relevance Feedback and the Genetic Algorithm can be used together to build personalized information filtering systems. The results of real user tests were promising. Users who could get familiar with the system used it as a powerful news filtering tool. However, we discovered that some more work needs to be done to help the users understand the agent better. Results of simulated tests under controlled environments were more conclusive. They show that Relevance Feedback as a technique is sufficient for profiles to specialize to static user interests. However, it cannot be used to adapt as user's interests change. Results further show that genetic algorithms are a promising approach for modeling adaptation and exploration in an information filtering system. More precisely, relevance feedback helps improve the overall precision, while genetic algorithms help improve the systemic recall.

Several criticisms of our approach have been made. One of the questions asked most frequently, by those watching demonstrations of Newt is that of serendipity. This question is of most concern to people from the media industry, especially newspapers. The question is, how would the system be able to recommend relevant articles that the user could not possibly have known to ask for in the first place? The genetic algorithm approach provides at least a partial solution to the serendipity problem. If a user's preferences are extremely stable, she can set the mutation rate low enough so that the articles retrieved are mostly from the current set of newsgroups. On the other hand, a user can turn the ``serendipity knob'' higher, if she really likes to continually receive information about different topics. An advantage is that a user can have serendipity when she wants, as much as she wants. The solution is limited, however, since the only serendipity the system is capable of producing is searching for the same keywords as before in other newsgroups.

Another criticism of this system is that there are shortcomings to a keyword based approach to filtering. Needless to say, there are limitations to the kinds of concepts that can be expressed in terms of keywords. As a result, there are some news articles which will be difficult to retrieve using keyword-based searches. A concept that is difficult to express using just keywords is point-of-view. For example, it is almost impossible to decide if a news editorial on the budget represents either the liberal or the conservative point of view based only on the keywords used in the editorial. Other concepts that cannot be conveniently expressed merely in terms of keywords include humor, rhetoric, satire, gossip, etc. These expose the shortcomings of a keyword based system. However, keyword based systems work fairly well for most newswire articles which typically deal with a particular event, person, place, or thing and have enough keyword clues in the articles that can be exploited (see Section ). Notice that Newt is not wedded to a keyword-based search engine. A system very similar to Newt could be built based on the framework described earlier in Chapter which would use a different search engine. The document and profile representations and the effect of feedback on the representation would change.

Another criticism is that our research is mostly focussed on newsgroups which carry newswires. How would the approach generalize to other kinds of information? As mentioned earlier, these constraints were imposed mainly due to the quality of information and computing power. Some newsgroups were excluded from the analysis because of poor quality of keywords, in terms of spelling and consistency of usage as well as for computer processing constraints. The former is definitely a greater problem, since the latter can be overcome with approximate but efficient algorithms. Given sufficiently good quality of data, the filtering approach described in this thesis should be generalizable to other data streams as well.

A brief discussion of the impact of personalized information filtering agents on the conventional media industry would not be inappropriate. The impact of systems such as Newt (or possible commercial grade versions) on the traditional media industry, as we have known, can be potentially quite significant. The trend towards automating substantial components of a news editor's filtering responsibilities is likely to continue, even if the ideal goal of complete automation may be quite distant, if not unreachable. A traditional newspaper serves many purposes - information, entertainment, communal needs, general interest, setting the national agenda, etc. The niche for personalized, prioritized information is likely to be the first one for which electronic surrogates for traditional newspapers will succeed. Other niches may take longer to occupy and are likely to be more difficult to automate. Nonetheless, the locus of control is tending to move closer to the consumer, who is likely to have a much greater say in regulating the incoming flow of information. It empowers producers of information to reach a large number of consumers and vice versa i.e. there will be fewer middlemen in the process. Personalized Information Filtering Agents take us a step closer towards managing the information rich world of the future.

Next: Future Work Up: No Title Previous: Results

MIT Media Lab - Autonomous Agents Group - agentmaster@media.mit.edu