Sciweavers

2190 search results - page 176 / 438
» Unweaving a web of documents
Sort
View
KDD
2006
ACM
185views Data Mining» more  KDD 2006»
16 years 7 months ago
Understanding Content Reuse on the Web: Static and Dynamic Analyses
Abstract. In this paper we present static and dynamic studies of duplicate and near-duplicate documents in the Web. The static and dynamic studies involve the analysis of similar c...
Ricardo A. Baeza-Yates, Álvaro R. Pereira J...
WWW
2011
ACM
15 years 29 days ago
Two-stream indexing for spoken web search
This paper presents two-stream processing of audio to index the audio content for Spoken Web search. The first stream indexes the meta-data associated with a particular audio doc...
Jitendra Ajmera, Anupam Joshi, Sougata Mukherjea, ...
CPM
2000
Springer
177views Combinatorics» more  CPM 2000»
15 years 11 months ago
Identifying and Filtering Near-Duplicate Documents
Abstract. The mathematical concept of document resemblance captures well the informal notion of syntactic similarity. The resemblance can be estimated using a fixed size “sketch...
Andrei Z. Broder
DOCENG
2007
ACM
15 years 10 months ago
Elimination of junk document surrogate candidates through pattern recognition
A surrogate is an object that stands for a document and enables navigation to that document. Hypermedia is often represented with textual surrogates, even though studies have show...
Eunyee Koh, Daniel Caruso, Andruid Kerne, Ricardo ...
WWW
2001
ACM
16 years 7 months ago
Algorithms and programming models for efficient representation of XML for Internet applications
XML is poised to take the World-Wide-Web to the next level of innovation. XML data, large or small, with or without associated schema, will be exchanged between increasing number ...
Neel Sundaresan, Reshad Moussa