We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
The growing popularity of hypertext navigation systems and the availability of large documentary databases is leading to the design of navigation systems that allow to explore the...
Frantz Vichot, Francis Wolinski, Joseph Tomeh, Syl...
An algorithm is presented that automatically matches images of presentation slides to the symbolic source file (e.g., PowerPointTM or AcrobatTM ) from which they were generated. T...
Information networks are widely used to characterize the relationships between data items such as text documents. Many important retrieval and mining tasks rely on ranking the dat...
For the RISM A/II collection of musical incipits (short extracts of scores, taken from the beginning), we have established a ground truth based on the opinions of human experts. I...