We consider the the problem of tracking heavy hitters and quantiles in the distributed streaming model. The heavy hitters and quantiles are two important statistics for characteri...
We make two main contributions in this paper. First, we motivate and introduce a novel class of data mining problems that arise in labeling a group of mass spectra, specifically f...
An important class of queries is the LIKE predicate in SQL. In the absence of an index, LIKE queries are subject to performance degradation. The notion of indexing on substrings (...
We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses ca...
Histograms are typically used to approximate data distributions. Histograms and related synopsis structures have been successful in a wide variety of popular database applications...