The exponential growth of data demands scalable infrastructures capable of indexing and searching rich content such as text, music, and images. A promising direction is to combine...
We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (...
Web pages, like people, are often known by others in a variety of contexts. When those contexts are sufficiently distinct, a page's importance may be better represented by mu...
Our approach to the Log Analysis for Digital Societies (LADS) task of LogCLEF 2009 is to define three different levels of performance: success, failure and strong failure. To inve...
As an increasing number of digital library projects embrace the harvesting of item-level descriptive metadata, issues of description granularity and concerns about potential loss ...
Muriel Foulonneau, Timothy W. Cole, Thomas G. Habi...