Sciweavers

6974 search results - page 966 / 1395
» Querying Semi-Structured Data
Sort
View
WWW
2007
ACM
16 years 7 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
KDD
2008
ACM
128views Data Mining» more  KDD 2008»
16 years 7 months ago
Scaling up text classification for large file systems
: We combine the speed and scalability of information retrieval with the generally superior classification accuracy offered by machine learning, yielding a two-phase text classifie...
George Forman, Shyamsundar Rajaram
VLDB
2007
ACM
121views Database» more  VLDB 2007»
16 years 7 months ago
Ranked Subsequence Matching in Time-Series Databases
Existing work on similar sequence matching has focused on either whole matching or range subsequence matching. In this paper, we present novel methods for ranked subsequence match...
Wook-Shin Han, Jinsoo Lee, Yang-Sae Moon, Haifeng ...
SIGMOD
2004
ACM
174views Database» more  SIGMOD 2004»
16 years 7 months ago
PIPES - A Public Infrastructure for Processing and Exploring Streams
PIPES is a flexible and extensible infrastructure providing fundamental building blocks to implement a data stream management system (DSMS). It is seamlessly integrated into the J...
Bernhard Seeger, Jürgen Krämer
EDBT
2004
ACM
131views Database» more  EDBT 2004»
16 years 7 months ago
Declustering Two-Dimensional Datasets over MEMS-Based Storage
Due to the large difference between seek time and transfer time in current disk technology, it is advantageous to perform large I/O using a single sequential access rather than mu...
Hailing Yu, Divyakant Agrawal, Amr El Abbadi