Sciweavers

1083 search results - page 77 / 217
» Efficient Discovery of Confounders in Large Data Sets
Sort
View
DPD
2006
141views more  DPD 2006»
15 years 6 months ago
Efficient parallel processing of range queries through replicated declustering
A common technique used to minimize I/O in data intensive applications is data declustering over parallel servers. This technique involves distributing data among several disks so...
Hakan Ferhatosmanoglu, Ali Saman Tosun, Guadalupe ...
PVLDB
2008
127views more  PVLDB 2008»
15 years 5 months ago
Discovering data quality rules
Dirty data is a serious problem for businesses leading to incorrect decision making, inefficient daily operations, and ultimately wasting both time and money. Dirty data often ari...
Fei Chiang, Renée J. Miller
PODS
2005
ACM
86views Database» more  PODS 2005»
16 years 6 months ago
Multi-structural databases
We introduce the Multi-Structural Database, a new data framework to support efficient analysis of large, complex data sets. An instance of the model consists of a set of data obje...
Ronald Fagin, Ramanathan V. Guha, Ravi Kumar, Jasm...
VLDB
1997
ACM
170views Database» more  VLDB 1997»
15 years 9 months ago
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
A new access method, called M-tree, is proposed to organize and search large data sets from a generic "metric space", i.e. where object proximity is only defined by a di...
Paolo Ciaccia, Marco Patella, Pavel Zezula
JMLR
2010
121views more  JMLR 2010»
15 years 1 months ago
Efficient Collapsed Gibbs Sampling for Latent Dirichlet Allocation
Collapsed Gibbs sampling is a frequently applied method to approximate intractable integrals in probabilistic generative models such as latent Dirichlet allocation. This sampling ...
Han Xiao, Thomas Stibor