XML diff algorithms proposed in the literature have focused on the structural analysis of the document. When XML is used for data exchange, or when different versions of a documen...
This paper addresses the problem of topic distillation on the World Wide Web, namely, given a typical user query to find quality documents related to the query topic. Connectivity...
Abstract. Mathematical documents, and their instrumentation by computers, have rich structure at the layers of presentation, metadata and semantics, as objects in a system for form...
Cezary Kaliszyk, Pierre Corbineau, Freek Wiedijk, ...
This paper presents an extensive experimental study of the state-of-the-art of XML compression tools. The study reports the behavior of nine XML compressors using a large corpus o...
In this paper we propose a completely unsupervised method for open-domain entity extraction and clustering over query logs. The underlying hypothesis is that classes defined by mi...