Sciweavers

3245 search results - page 371 / 649
» Mining Transformed Data Sets
Sort
View
SIGMOD
2008
ACM
167views Database» more  SIGMOD 2008»
16 years 6 months ago
Efficient lineage tracking for scientific workflows
Data lineage and data provenance are key to the management of scientific data. Not knowing the exact provenance and processing pipeline used to produce a derived data set often re...
Thomas Heinis, Gustavo Alonso
WWW
2008
ACM
16 years 7 months ago
Measuring extremal dependencies in web graphs
We analyze dependencies in power law graph data (Web sample, Wikipedia sample and a preferential attachment graph) using statistical inference for multivariate regular variation. ...
Yana Volkovich, Nelly Litvak, Bert Zwart
KDD
2002
ACM
93views Data Mining» more  KDD 2002»
16 years 7 months ago
Interactive deduplication using active learning
Deduplication is a key operation in integrating data from multiple sources. The main challenge in this task is designing a function that can resolve when a pair of records refer t...
Sunita Sarawagi, Anuradha Bhamidipaty
PAKDD
2009
ACM
186views Data Mining» more  PAKDD 2009»
16 years 1 months ago
Pairwise Constrained Clustering for Sparse and High Dimensional Feature Spaces
Abstract. Clustering high dimensional data with sparse features is challenging because pairwise distances between data items are not informative in high dimensional space. To addre...
Su Yan, Hai Wang, Dongwon Lee, C. Lee Giles
ICDM
2009
IEEE
199views Data Mining» more  ICDM 2009»
16 years 1 months ago
Active Learning with Adaptive Heterogeneous Ensembles
—One common approach to active learning is to iteratively train a single classifier by choosing data points based on its uncertainty, but it is nontrivial to design uncertainty ...
Zhenyu Lu, Xindong Wu, Josh Bongard