Available GPUs provide increasingly more processing power especially for multimedia and digital signal processing. Despite the tremendous progress in hardware and thus processing p...
The performance of many scientific applications for distributed memory platforms can be increased by utilizing multiprocessor-task programming. To obtain the minimum parallel runt...
There are many application classes where the users are flexible with respect to the output quality. At the same time, there are other constraints, such as the need for real-time ...
A reduction is a computation in which a common operation, such as a sum, is to be performed across multiple pieces of data, each supplied by a separate task. We introduce phaser a...
Jun Shirako, David M. Peixotto, Vivek Sarkar, Will...
The dense nonsymmetric eigenproblem is one of the hardest linear algebra problems to solvee ectively on massivelyparallel machines. Rather than trying to design a black box" ...