Effective daily processing of large amounts of paper documents in office environments requires the application of semantic-based indexing techniques during the transformation of pa...
Wrapping is the process of navigating a data source, semiautomatically extracting data and transforming it into a form suitable for data processing applications. There are current...
This paper proposes a new method for binarization of digital documents. The proposed approach performs binarization by using a heuristic algorithm with two different thresholds an...
George D. C. Cavalcanti, Eduardo F. A. Silva, Cleb...
Electronic documents are more easily copied and redistributed than paper documents. This is a major impediment to electronic publishing. Illegal redistribution can be discouraged ...
This paper presents the XML-based formats ALTO, TEI, METS used for Digital Libraries and their interest for data representation in a Document Image Analysis and Recognition (DIAR)...