ABSTRACT: OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in rea...
The rapid adoption of XML as the standard for data representation and exchange foreshadows a massive increase in the amounts of XML data collected, maintained, and queried over th...
Neoklis Polyzotis, Minos N. Garofalakis, Yannis E....
: Information retrieval tries to identify relevant documents for an information need. The problems that an IR system should deal with include document indexing (which tries to extr...
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
Research advances in geospatial automated image analysis tools and feature extraction algorithms have matured in recent times to levels of practical applicability. The consolidati...