A common technique used to minimize I/O in data intensive applications is data declustering over parallel servers. This technique involves distributing data among several disks so...
Hakan Ferhatosmanoglu, Ali Saman Tosun, Guadalupe ...
Dirty data is a serious problem for businesses leading to incorrect decision making, inefficient daily operations, and ultimately wasting both time and money. Dirty data often ari...
We introduce the Multi-Structural Database, a new data framework to support efficient analysis of large, complex data sets. An instance of the model consists of a set of data obje...
Ronald Fagin, Ramanathan V. Guha, Ravi Kumar, Jasm...
A new access method, called M-tree, is proposed to organize and search large data sets from a generic "metric space", i.e. where object proximity is only defined by a di...
Collapsed Gibbs sampling is a frequently applied method to approximate intractable integrals in probabilistic generative models such as latent Dirichlet allocation. This sampling ...