We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (...
Language modeling is an effective and theoretically attractive probabilistic framework for text information retrieval. The basic idea of this approach is to estimate a language mo...
WINACS (Web-based Information Network Analysis for Computer Science) is a project that incorporates many recent, exciting developments in data sciences to construct a Web-based co...
We present BlogScope (www.blogscope.net), a system for analyzing the Blogosphere. BlogScope is an information discovery and text analysis system that offers a set of unique featur...
In this poster we propose a granular approach for presenting web search results. Sentences, taken from the top documents, are used as fine-grained representations of document cont...