Abstract. Automated modeling of appropriate and valid document descriptions is a central issue for the benefit and success of an ontologybased personal document management system. ...
Annett Mitschick, Ralf Nagel 0002, Klaus Meiß...
Table of contents (TOC) recognition has attracted a great deal of attention in recent years. After reviewing the merits and drawbacks of the existing TOC recognition methods, we h...
This paper describes an approach to attention based layout segmentation using general principles of the human visual perception to achieve this goal. The text is considered as tex...
This paper presents the XML-based formats ALTO, TEI, METS used for Digital Libraries and their interest for data representation in a Document Image Analysis and Recognition (DIAR)...
The new wrapper model for extractiong text data from HTML documents is introduced. The Kushmerick's wrapper class (Kusshmerick 2000) may be unsuccessful in the case that suff...