Stencil-based kernels constitute the core of many scientific applications on block-structured grids. Unfortunately, these codes achieve a low fraction of peak performance, due pr...
Shoaib Kamil, Kaushik Datta, Samuel Williams, Leon...
Recently there has been a surge of interest in developing performance debugging tools to help programmers tune their applications for better memory performance [2, 4, 10]. These t...
Margaret Martonosi, Anoop Gupta, Thomas E. Anderso...
In this paper we propose a novel integrated circuit and architectural level technique to reduce leakage power consumption in high performance cache memories using single Vt (trans...
As multi/many core processors become prevalent, programming language is important in constructing efficient parallel applications. In this work, we build a multithreaded video min...
Wenlong Li, Eric Li, Ran Meng, Tao Wang, Carole Du...
Speculative parallelization can provide significant sources of additional thread-level parallelism, especially for irregular applications that are hard to parallelize by conventio...