This paper presents a new approach for representing multidimensional data by a compact number of bases. We consider the multidimensional data as tensors instead of matrices or vec...
Using language technology for text analysis and light-weight ontologies as a content-mediating level, we acquire indexing patterns from vast amounts of indexing data for Englishla...
We describe a methodology for retrieving document images from large extremely diverse collections. First we perform content extraction, that is the location and measurement of reg...
In this paper, we present a technique for visual analysis of documents based on the semantic representation of text in the form of a directed graph, referred to as semantic graph....
Delia Rusu, Blaz Fortuna, Dunja Mladenic, Marko Gr...
This paper presents a new approach for the binarization of seriously degraded manuscript. We introduce a new technique based on a Markov Random Field (MRF) model of the document. ...