PixED (from Pixel to Electronic Document) is aimed at converting document images into structured electronic documents which can be read by a machine for information retrieval. The...
Typical application scenarios in the area of rich-media management, such as the continuous digitisation of the media production processes, the search and retrieval tasks in a grow...
Combating Web spam has become one of the top challenges for Web search engines. State-of-the-art spam detection techniques are usually designed for specific known types of Web spa...
Yiqun Liu, Rongwei Cen, Min Zhang, Shaoping Ma, Li...
This paper outlines the new resource technologies, products and applications that have been constructed during the development of a multi-modal (MM hereafter) corpus tool on the D...
With increasing complexity of manufacturing processes, the volume of data that has to be evaluated rises accordingly. The complexity and data volume make any kind of manual data a...
Peter Benjamin Volk, Martin Hahmann, Dirk Habich, ...