We present a new algorithm to measure domain-specific readability. It iteratively computes the readability of domainspecific resources based on the difficulty of domain-specific c...
Large collections of scanned documents (books and journals) are now available in Digital Libraries. The most common method for retrieving relevant information from these collectio...
Although the Web lets users freely browse and publish information, most Web information is unauthorized in contrast to conventional mass media. Therefore, it is not always credibl...
In this paper, we present an analysis based on linguistic and typographic features that allows for the identification of titles in web documents. We focus in particular on procedu...
Enriching knowledge bases with multimedia information makes it possible to complement textual descriptions with visual and audio information. Such complementary information can he...