Sciweavers

3152 search results - page 398 / 631
» Retrieval of Partial Documents
Sort
View
WWW
2009
ACM
16 years 7 months ago
User-centric content freshness metrics for search engines
In order to return relevant search results, a search engine must keep its local repository synchronized to the Web, but it is usually impossible to attain perfect freshness. Hence...
Ali Dasdan, Xinh Huynh
WWW
2007
ACM
16 years 7 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
KDD
2002
ACM
186views Data Mining» more  KDD 2002»
16 years 7 months ago
Topic-conditioned novelty detection
Automated detection of the first document reporting each new event in temporally-sequenced streams of documents is an open challenge. In this paper we propose a new approach which...
Yiming Yang, Jian Zhang, Jaime G. Carbonell, Chun ...
SIGMOD
2008
ACM
122views Database» more  SIGMOD 2008»
16 years 6 months ago
Building query optimizers for information extraction: the SQoUT project
Text documents often embed data that is structured in nature. This structured data is increasingly exposed using information extraction systems, which generate structured relation...
Alpa Jain, Panagiotis G. Ipeirotis, Luis Gravano
EDBT
2004
ACM
172views Database» more  EDBT 2004»
16 years 6 months ago
Content-Based Routing of Path Queries in Peer-to-Peer Systems
Peer-to-peer (P2P) systems are gaining increasing popularity as a scalable means to share data among a large number of autonomous nodes. In this paper, we consider the case in whic...
Georgia Koloniari, Evaggelia Pitoura