We describe the objectives and organization of the CLEF 2007 ad hoc track and discuss the main characteristics of the tasks offered to test monolingual and cross-language textual d...
Giorgio Maria Di Nunzio, Nicola Ferro, Thomas Mand...
This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are increasingly inadequate. While we often search for various ...
Automated detection of the first document reporting each new event in temporally-sequenced streams of documents is an open challenge. In this paper we propose a new approach which...
Yiming Yang, Jian Zhang, Jaime G. Carbonell, Chun ...
A commercial Web page typically contains many information blocks. Apart from the main content blocks, it usually has such blocks as navigation panels, copyright and privacy notice...