Modern Graphic Processing Units (GPUs) provide sufficiently flexible programming models that understanding their performance can provide insight in designing tomorrow’s manyco...
Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, He...
Chip-multiprocessors are quickly gaining momentum in all segments of computing. However, the practical success of CMPs strongly depends on addressing the difficulty of multithread...
Sewook Wee, Jared Casper, Njuguna Njoroge, Yuriy T...
This paper investigates helper threads that improve performance by prefetching data on behalf of an application’s main thread. The focus is data prefetch helper threads that lac...
Symbiotic job scheduling boosts simultaneous multithreading (SMT) processor performance by co-scheduling jobs that have ‘compatible’ demands on the processor’s shared resour...
In this paper we describe our experience with Teapot [7], a domain-specific language for writing cache coherence protocols. Cache coherence is of concern when parallel and distrib...
Satish Chandra, James R. Larus, Michael Dahlin, Br...