Massive data streams are now fundamental to many data processing applications. For example, Internet routers produce large scale diagnostic data streams. Such streams are rarely s...
Graham Cormode, Mayur Datar, Piotr Indyk, S. Muthu...
Pathogenicity islands (PAIs) are genetic elements whose products are essential to the process of disease development. They have been horizontally (laterally) transferred from othe...
Sung Ho Yoon, Young-Kyu Park, Soohyun Lee, Doil Ch...
Duplicate detection is the process of identifying multiple representations of a same real-world object in a data source. Duplicate detection is a problem of critical importance in...
Melanie Weis, Felix Naumann, Ulrich Jehle, Jens Lu...
The quality of a local search engine, such as Google and Bing Maps, heavily relies on its geographic datasets. Typically, these datasets are obtained from multiple sources, e.g., ...
Supporting legacy applications when the database schema evolves represents a long-standing challenge of practical and theoretical importance. Recent work has produced algorithms a...
Carlo Curino, Hyun Jin Moon, Alin Deutsch, Carlo Z...