We present a novel design and implementation of relational join algorithms for new-generation graphics processing units (GPUs). The most recent GPU features include support for wr...
Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga K. ...
The Cell BE is a multicore processor with eight vector accelerators (called SPEs) that implement explicit cache management through direct memory access engines. While the Cell has...
Srinivas Chellappa, Franz Franchetti, Markus P&uum...
—The MPI-2 Standard, released in 1997, defined an interface for one-sided communication, also known as remote memory access (RMA). It was designed with the goal that it should p...
Vinod Tipparaju, William Gropp, Hubert Ritzdorf, R...
—The implementation and optimization of collective communication operations is an important field of active research. Such operations directly influence application performance...
Torsten Hoefler, Christian Siebert, Andrew Lumsdai...
—Lattice Boltzmann Methods (LBM) are used for the computational simulation of Newtonian fluid dynamics. LBM-based simulations are readily parallelizable; they have been implemen...
Peter Bailey, Joe Myre, Stuart D. C. Walsh, David ...