A Focused crawler must use information gleaned from previously crawled page sequences to estimate the relevance of a newly seen URL. Therefore, good performance depends on powerfu...
Hongyu Liu, Evangelos E. Milios, Jeannette Janssen
Maintenance of large Web sites is a complex task, similar in some sense to software maintenance. Content should be separated from the formatting rules, allowing independent develo...
Rodrigo Giacomini Moro, Renata de Matos Galante, C...
In this paper, we introduce a system that aims at recognizing chart images using a model-based approach. First of all, basic chart models are designed for four different chart typ...
For documents with complex or atypical annotations, multihierarchical structures play the role of the document tree in traditional XML documents. We define a model of overlapping...
This paper presents an approach for modeling landmark sites such as the Statue of Liberty based on large-scale contaminated image collections gathered from the Internet. Our system...
Xiaowei Li, Changchang Wu, Christopher Zach, Svetl...