The information presented in a document often consists of primary content as well as supporting material such as explanatory notes, detailed derivations, illustrations, and the li...
Bay-Wei Chang, Jock D. Mackinlay, Polle Zellweger,...
Existing Language Identification (LID) approaches do reach 100% precision, in most common situations, when dealing with documents written in just one language, and when those docu...
Research articles typically introduce new results or findings and relate them to knowledge entities of immediate relevance. However, a large body of context knowledge related to t...
It has been shown that the computation time of Document Image Decoding can be significantly reduced by employing heuristics in the search for the best decoding of a text line. In ...
We present a novel sequential clustering algorithm which is motivated by the Information Bottleneck (IB) method. In contrast to the agglomerative IB algorithm, the new sequential ...