Current approaches to develop information extraction (IE) programs have largely focused on producing precise IE results. As such, they suffer from three major limitations. First, ...
Warren Shen, Pedro DeRose, Robert McCann, AnHai Do...
Schema matching identifies elements of two given schemas that correspond to each other. Although there are many algorithms for schema matching, little has been written about build...
Philip A. Bernstein, Sergey Melnik, Michalis Petro...
We present a technique that masks failures in a cluster to provide high availability and fault-tolerance for long-running, parallelized dataflows. We can use these dataflows to im...
Mehul A. Shah, Joseph M. Hellerstein, Eric A. Brew...
This paper addresses the problem of evaluating ranked top-? queries with expensive predicates. As major DBMSs now all support expensive user-defined predicates for Boolean queries...
Over the last decades, improvements in CPU speed have outpaced improvements in main memory and disk access rates by orders of magnitude, enabling the use of data compression techn...