An approach to simultaneous document classification and word clustering is developed using a two-way mixture model of Poisson distributions. Each document is represented by a vect...
Named entities (e.g., "Kofi Annan", "Coca-Cola", "Second World War") are ubiquitous in web pages and other types of document and often provide a simpl...
Felix Weigel, Klaus U. Schulz, Levin Brunner, Edua...
In this paper, we study query processing in a distributed text database. The novelty is a real distributed architecture implementation that offers concurrent query service. The di...
Claudine Santos Badue, Ricardo A. Baeza-Yates, Ber...
The rise of social media sites — blogs, wikis, and Digg — underscores the transformation of the Web to a participatory medium in which users are collaboratively creating, eval...
This paper presents a pioneering effort towards machine authentication of security documents like bank cheques, legal deeds, certificates, etc. that fall under the same class as f...