Although text categorization is a burgeoning area of IR research, readily available test collections in this field are surprisingly scarce. We describe a methodology and system (...
A text retrieval method called the thematic geographical search method has been developed and applied to a Japanese encyclopedia called the World Encyclopædia. In this method, th...
We discuss two learning algorithms for text filtering: modified Rocchio and a boosting algorithm called AdaBoost. We show how both algorithms can be adapted to maximize any gene...
Stemming is a technique which aims to extract common suffixes of words. Thus, words which are literally differhave a common stem, may be abstracted by their common stem. The under...
Many text documents naturally have two kinds of labels. For example, we may label web pages from universities according to their categories, such as "student" or "fa...