Results

Next: Scenario 2 Up: Specializing to User Previous: Experiment 1

Results

The performance measures used are precision and recall. These measures are well known and commonly used to evaluate the performance of information retrieval systems. Precision is the number of retrieved articles that were relevant. Recall is the number relevant articles that were retrieved. This experiment evaluates the behavior of precision and recall over successive user sessions. The database does not change over the duration of this experiment. Recall is calculated by manually going through each of the articles in the newsgroups searched by the profile. Each article is classified as relevant if the major topic of the news story was associated with China. Fortunately, the articles were straightforward to classify and there were barely any borderline cases.

Figure shows the results of the experiment. The figure consists of four graphs. Each graph is a plot of precision versus recall, at different points in time, as indicated by the label. When a search is performed, the articles are effectively sorted by their similarity scores. If all the articles are retrieved for the user, the recall is , but the precision is very low. If none of the articles are retrieved, recall is , but the precision is . For intermediate numbers of articles retrieved, different values of recall and precision will be observed. By using the number of articles as a parameter, a graph can be drawn connecting the set of all possible precision-recall pairs. The graphs help us compare the performances of the each search for all possible values of the parameter.

Initially, when the profile is empty, the articles retrieved are in a random order. The graph for the initial profile (labeled ``'') is the result when all articles are scored zero since none of them match the empty profile. Since this iteration is a random ordering of articles, it is possible that none of the retrieved articles are relevant. In that case, the user needs to program the agent by demonstration by providing a few examples of relevant documents. After feedback is provided to the relevant articles, the profile is modified and the search is performed again. The search is much improved and the result is shown by the next line graph (labeled ``''). Successive iterations keep improving the performance of the search.

The profile is quite successful at converging on the terms commonly occurring in articles relating to ``China''. This can be seen from the final (at ) profile as shown in Table .

It can be seen that given an empty initial profile (so there is nothing to unlearn) and with consistent user feedback the system is successful at specializing to user interests. The caveat is that the user interest must be amenable to a keyword-based description (see Chapter ).

Next: Scenario 2 Up: Specializing to User Previous: Experiment 1

MIT Media Lab - Autonomous Agents Group - agentmaster@media.mit.edu