A large part of the data on the World Wide Web is hidden behind form-like interfaces. These interfaces interact with a hidden backend database to provide answers to user queries. ...
Map-Reduce is a programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. Through ...
Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, Dougl...
Traditional duplicate elimination techniques are not applicable to many data stream applications. In general, precisely eliminating duplicates in an unbounded data stream is not f...
We define a match join of R and S with predicate to be a subset of the -join of R and S such that each tuple of R and S contributes to at most one result tuple. Match joins and t...
Ameet Kini, Srinath Shankar, Jeffrey F. Naughton, ...
This paper introduces the Tuple Graph (TuG) synopses, a new class of data summaries that enable accurate selectivity estimates for complex relational queries. The proposed summari...