Sciweavers

8795 search results - page 395 / 1759
» Measuring Generality of Documents
Sort
View
WWW
2007
ACM
16 years 7 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
185
Voted
WWW
2004
ACM
16 years 7 months ago
Web page summarization using dynamic content
Summarizing web pages have recently gained much attention from researchers. Until now two main types of approaches have been proposed for this task: content- and context-based met...
Adam Jatowt
SIGIR
2006
ACM
16 years 22 days ago
Regularized estimation of mixture models for robust pseudo-relevance feedback
Pseudo-relevance feedback has proven to be an effective strategy for improving retrieval accuracy in all retrieval models. However the performance of existing pseudo feedback meth...
Tao Tao, ChengXiang Zhai
DOCENG
2008
ACM
15 years 8 months ago
PrintMonkey: giving users a grip on printing the web
Web content is notoriously difficult to capture on a printed page due to inconsistent and undesired results. Items that users may not want to print, such as media, navigation menu...
Jennifer Baldwin, James A. Rowson, Yvonne Coady
DOCENG
2008
ACM
15 years 8 months ago
Malan: a mapping language for the data manipulation
Malan is a MApping LANguage that allows the generation of transformation programs by specifying a schema mapping between a source and target data schema. By working at the schema ...
Arnaud Blouin, Olivier Beaudoux, Stéphane L...