In recent years, many algorithms for the Web have been developed that work with information units distinct from individual web pages. These include segments of web pages or aggreg...
Abstract. We consider a collaboration of peers autonomously crawling the Web. A pivotal issue when designing a peer-to-peer (P2P) Web search engine in this environment is query rou...
Sebastian Michel, Matthias Bender, Peter Triantafi...
In this poster, we present an information extraction engine for web-based forums. The engine analyzes the HTML files crawled from web forums, deduces the wrapper (template) of the...
Hanny Yulius Limanto, Nguyen Ngoc Giang, Vo Tan Tr...
In this paper, we propose a multimodal Web image retrieval technique based on multi-graph enabled active learning. The main goal is to leverage the heterogeneous data on the Web t...
Taking advantage of the well-known cluster hypothesis that “closely associated documents tend to be relevant to the same request”, we can use inter-document similarity to prov...