MapReduce is emerging as an important programming model for large-scale data-parallel applications such as web indexing, data mining, and scientific simulation. Hadoop is an open-...
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, ...
A commodity I/O device has no support for virtualization. A VMM can assign such a device to a single guest with direct, fast, but insecure access by the guest's native device...
The ParAccel Analytic DatabaseTM is a fast shared-nothing parallel relational database system with a columnar orientation, adaptive compression, memory-centric design, and an enha...
Yijou Chen, Richard L. Cole, William J. McKenna, S...
We define a match join of R and S with predicate to be a subset of the -join of R and S such that each tuple of R and S contributes to at most one result tuple. Match joins and t...
Ameet Kini, Srinath Shankar, Jeffrey F. Naughton, ...
We describe an approach for pipelining nested data collections in scientific workflows. Our approach logically delimits arbitrarily nested collections of data tokens using special...