Today's enterprise databases are large and complex, often relating hundreds of entities. Enabling ordinary users to query such databases and derive value from them has been o...
Sampling is a popular method of data collection when it is impossible or too costly to reach the entire population. For example, television show ratings in the United States are g...
Data stream applications have made use of statistical summaries to reason about the data using nonparametric tools such as histograms, heavy hitters, and join sizes. However, rela...
In many data sharing settings, such as within the biological and biomedical communities, global data consistency is not always attainable: different sites' data may be dirty,...
We consider the problem of joining data streams using limited cache memory, with the goal of producing as many result tuples as possible from the cache. Many cache replacement heu...