With the increased clock frequency of modern, high-performance processors over 500 MHz, in some cases, limiting the power dissipation has become the most stringent design target. ...
Luca Benini, Giovanni De Micheli, Alberto Macii, E...
Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout ...
In this paper, the effect of switch design on the application performance of cache-coherent non-uniform memory access (CC-NUMA) multiprocessors is studied in detail. Wormhole rout...
Laxmi N. Bhuyan, Hu-Jun Wang, Ravi R. Iyer, Akhile...
This paper examines the performance of desktop applications running on the Microsoft Windows NT operating system on Intel x86 processors, and contrasts these applications to the p...
Dennis C. Lee, Patrick Crowley, Jean-Loup Baer, Th...
This paper presents a hybrid approach to automatic parallelization of computer programs which combines static extraction of threads (tasks) with dynamic scheduling for parallel an...
Ronald Moore, Melanie Klang, Bernd Klauer, Klaus W...