Joins are essential for many data analysis tasks, but are not supported directly by the MapReduce paradigm. While there has been progress on equi-joins, implementation of join alg...
Today’s one-pass analytics applications tend to be data-intensive in nature and require the ability to process high volumes of data efficiently. MapReduce is a popular programm...
Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGreg...
In most enterprises, databases are deployed on dedicated database servers. Often, these servers are underutilized much of the time. For example, in traces from almost 200 producti...
Carlo Curino, Evan P. C. Jones, Samuel Madden, Har...
In the database community, work on information extraction (IE) has centered on two themes: how to effectively manage IE tasks, and how to manage the uncertainties that arise in th...
Daisy Zhe Wang, Michael J. Franklin, Minos N. Garo...
Central to a data cleaning system are record matching and data repairing. Matching aims to identify tuples that refer to the same real-world object, and repairing is to make a dat...
Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, Weny...