We consider the coverage testing problem where we are given a document and a corpus with a limited query interface and asked to find if the corpus contains a near-duplicate of th...
Ali Dasdan, Paolo D'Alberto, Santanu Kolay, Chris ...
In this paper, we present a semi-supervised learning method for web page classification, leveraging click logs to augment training data by propagating class labels to unlabeled si...
Soo-Min Kim, Patrick Pantel, Lei Duan, Scott Gaffn...
An effective graphic interface is a key tool to improve the fruition of the results retrieved by an Information Retrieval (IR) system. In this work, we describe a two-dimensional...
Lorenzo De Stefani, Giorgio Maria Di Nunzio, Giorg...
In this paper, we present an extension of PHIL, a declarative language for filtering information from XML data. The proposed approach allows us to extract relevant data as well a...
Organizing textual documents into a hierarchical taxonomy is a common practice in knowledge management. Beside textual features, the hierarchical structure of directories reflect...
Yi Huang, Kai Yu, Matthias Schubert, Shipeng Yu, V...