ABSTRACT: OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in rea...
Abstract. Effective and efficient management and manipulation of XML documents requires stable decisions at the time a document enters the XML DBMS to provide for storage structure...
Abstract. In this paper we present a system, DoLSuD, for the automatic discovery of relevant substructures in a document layout. DoLSuD, Document Layout Substructure Discovery, ext...
Form document analysis is one of the most essential tasks in document analysis and recognition. One of the most fundamental and crucial tasks is the extraction of the reference li...
This paper describes a new method for the classification of a HTML document into a hierarchy of categories. The hierarchy of categories is involved in all phases of automated docum...