Automatically generated HTML, as produced by WYSIWYG programs, typically contains much repetitive and unnecessary markup. This paper identifies aspects of such HTML that may be al...
We introduce the problem of query decomposition, where we are given a query and a document retrieval system, and we want to produce a small set of queries whose union of resulting...
Francesco Bonchi, Carlos Castillo, Debora Donato, ...
Abstract. Partial-match queries return data items that contain a subset of the query keywords and order the results based on the statistical properties of the matched keywords. The...
The aim of query-based sampling is to obtain a sufficient, representative sample of an underlying (text) collection. Current measures for assessing sample quality are too coarse gr...
In this paper, we present a multi-level recognizer for online Arabic handwriting. In Arabic script (handwritten and printed), cursive writing – is not a style – it is an inher...