Abstract— The Topic Detection and Tracking (TDT) research community investigates information retrieval methods for organizing a constantly arriving stream of news articles by the...
James Allan, Stephen M. Harding, David Fisher, Alv...
Given the large heterogeneity of the World Wide Web, using metadata on the search engines side seems to be a useful track for information retrieval. Though, because a manual quali...
Camille Prime-Claverie, Michel Beigbeder, Thierry ...
The exponential growth of data demands scalable infrastructures capable of indexing and searching rich content such as text, music, and images. A promising direction is to combine...
Cluster methods have been successfully applied in gene expression data analysis to address tumor classification. By grouping tissue samples into homogeneous subsets, more systema...
We present a highly accurate method for classifying web pages based on link percentage, which is the percentage of text characters that are parts of links normalized by the number...