Abstract. This paper investigates two types of overhead due to duplicated local computations, which are frequently encountered in the parallel software of overlapping domain decomp...
Many performance problems observed in high end systems are actually caused by the runtime system and not the application code. Detecting these cases will require parallel performa...
Rashawn L. Knapp, Karen L. Karavanic, Douglas M. P...
This paper evaluates the tradeoffs involved in the design of the software-extended memory system of Alewife, a multiprocessor architecturethat implements coherentsharedmemorythrou...
—This paper introduces the microarchitecture and logical implementation of SMT (Simultaneous Multithreading) improvement of Godson-2 processor which is a 64-bit, four-issue, out-...
GPUs have recently evolved into very fast parallel co-processors capable of executing general purpose computations extremely efficiently. At the same time, multi-core CPUs evolutio...
George Teodoro, Rafael Sachetto Oliveira, Olcay Se...