Abstract We introduce OCELOT, a prototype system for automatically generating the “gist” of a web page by summarizing it. Although most text summarization research to date has ...
Using a lexicon can often improve character recognition under challenging conditions, such as poor image quality or unusual fonts. We propose a flexible probabilistic model for c...
Jerod J. Weinman, Erik G. Learned-Miller, Allen R....
—This paper presents a semiautomatic framework that aims to produce domain concept maps from text and then to derive domain ontologies from these concept maps. This methodology p...
Cheap and versatile cameras make it possible to easily and quickly capture a wide variety of documents. However, low resolution cameras present a challenge to OCR because it is vi...
Charles E. Jacobs, Patrice Y. Simard, Paul A. Viol...
In this paper, we present some results of an ongoing research involving the design and implementation, in an eGovernment scenario, of a multi-version repository of norm texts suppo...