The Information Filtering module (called YAIF, for Yet Another Information Filter) is responsible for actually retrieving articles from the database of news articles. YAIF takes the profiles, scores articles with respect to the profiles and selects the high scoring articles to be presented to the user. YAIF is a time-intensive process and is run offline. The process is executed every night, so that filtered articles are available to every profile in the morning. This frequency is sufficient, since the data within a newsgroup does not change significantly within a day. In this section, the process of finding articles matching one profile is described. The same process is repeated for all available profiles.
Each profile is stored in a separate file. Two sets of newsgroups are searched
for each profile. One set is the list of newsgroups specified in the newsgroup
field of the profile. The other set consists of the newsgroups which may not
be mentioned in the newsgroup field, but some articles belonging to it received
feedback and are listed under ``ArtFeedback'' (see table ).
This is the case when users program by demonstration. For every profile, YAIF
retrieves each article from the two sets of newsgroups and scores them with
respect to the profile.
A typical article is shown in table .
Each article contains some structured and unstructured information. The unstructured
information is the actual information content of the article, namely the text.
The structured part is the meta-information about the text of the article. This
varies greatly depending on the source of news. There are a number of header
lines that an article must have to adhere to the Standard for Interchange of
USENET messages [20]. Some of the headers
interesting for filtering purposes, that are mandated by the USENET protocol
include Date, From, and Subject. Some suppliers of information provide additional
information, which is optional. Such fields include pre-indexed keywords, Organization,
(number of) Lines, Sender, Location and so on.
Each candidate article evaluated by YAIF is first converted to its representation before they can be scored. The method of translating documents to their representations is described in the next section.