—We address the recently recognized privatization problem in software transactional memory (STM) runtimes, and introduce the notion of partially visible reads (PVRs) to heuristic...
Virendra J. Marathe, Michael F. Spear, Michael L. ...
Abstract—The Sparse Matrix-Vector Multiplication kernel exhibits limited potential for taking advantage of modern shared memory architectures due to its large memory bandwidth re...
Kornilios Kourtis, Georgios I. Goumas, Nectarios K...
This paper presents a novel stateless, virtualized communication engine for sub-microsecond latency. Using a Field-Programmable-Gate-Array (FPGA) based prototype we show a latency...
We design and implement Mars, a MapReduce framework, on graphics processors (GPUs). MapReduce is a distributed programming framework originally proposed by Google for the ease of ...
Bingsheng He, Wenbin Fang, Qiong Luo, Naga K. Govi...
Heterogeneous supercomputers with combined general purpose and accelerated CPUs promise to be the future major architecture due to their wideranging generality and superior perfor...