Though the GPGPU concept is well-known in image processing, much more work remains to be done to fully exploit GPUs as an alternative computation engine. The difficulty is not refo...
Data communications between producer instructions and consumer instructions through memory incur extra delays that degrade processor performance. In this paper, we introduce a new...
Main memory in many tera-scale systems requires tens of kilowatts of power. The resulting energy consumption increases system cost and the heat produced reduces reliability. Emerg...
Matthew E. Tolentino, Joseph Turner, Kirk W. Camer...
Most shared memory systems maximize performance by unpredictably resolving memory races. Unpredictable memory races can lead to nondeterminism in parallel programs, which can suff...
Derek Hower, Polina Dudnik, Mark D. Hill, David A....
Automatic Global Data Partitioning for Distributed Memory Machines DMMs is a di cult problem. In this work, we present a partitioning strategy called 'Hyperplane Partitioning...