Sciweavers

1550 search results - page 186 / 310
» Evaluating Document Clustering for Interactive Information R...
Sort
View
CLEF
2007
Springer
16 years 24 days ago
CLEF 2007: Ad Hoc Track Overview
We describe the objectives and organization of the CLEF 2007 ad hoc track and discuss the main characteristics of the tasks offered to test monolingual and cross-language textual d...
Giorgio Maria Di Nunzio, Nicola Ferro, Thomas Mand...
CIKM
2011
Springer
14 years 6 months ago
Probabilistic near-duplicate detection using simhash
This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
Sadhan Sood, Dmitri Loguinov
SIGMOD
2007
ACM
105views Database» more  SIGMOD 2007»
16 years 6 months ago
Supporting entity search: a large-scale prototype search engine
As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are increasingly inadequate. While we often search for various ...
Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang
KDD
2002
ACM
186views Data Mining» more  KDD 2002»
16 years 7 months ago
Topic-conditioned novelty detection
Automated detection of the first document reporting each new event in temporally-sequenced streams of documents is an open challenge. In this paper we propose a new approach which...
Yiming Yang, Jian Zhang, Jaime G. Carbonell, Chun ...
KDD
2003
ACM
161views Data Mining» more  KDD 2003»
16 years 7 months ago
Eliminating noisy information in Web pages for data mining
A commercial Web page typically contains many information blocks. Apart from the main content blocks, it usually has such blocks as navigation panels, copyright and privacy notice...
Lan Yi, Bing Liu, Xiaoli Li