In this paper, the effect of switch design on the application performance of cache-coherent non-uniform memory access (CC-NUMA) multiprocessors is studied in detail. Wormhole rout...
Laxmi N. Bhuyan, Hu-Jun Wang, Ravi R. Iyer, Akhile...
This paper examines the performance of desktop applications running on the Microsoft Windows NT operating system on Intel x86 processors, and contrasts these applications to the p...
Dennis C. Lee, Patrick Crowley, Jean-Loup Baer, Th...
This paper presents a hybrid approach to automatic parallelization of computer programs which combines static extraction of threads (tasks) with dynamic scheduling for parallel an...
Ronald Moore, Melanie Klang, Bernd Klauer, Klaus W...
To achieve scalable parallel performance in Molecular Dynamics Simulation, we have modeled and implemented several dynamic spatial domain decomposition algorithms. The modeling is ...
Lars S. Nyland, Jan Prins, Ru Huai Yun, Jan Herman...
We present the Data Parallel Fortran (DPF) benchmark suite, a set of data parallel Fortran codes forevaluatingdata parallel compilers appropriatefor any target parallel architectu...
Y. Charlie Hu, S. Lennart Johnsson, Dimitris Kehag...