Advances in VLSI technology will enable chips with over a billion transistors within the next decade. Unfortunately, the centralized-resource architectures of modern microprocesso...
Walter Lee, Rajeev Barua, Matthew Frank, Devabhakt...
Memory bandwidth limits the performance of important kernels in many scientific applications. Such applications often use sequences of Basic Linear Algebra Subprograms (BLAS), an...
Geoffrey Belter, Elizabeth R. Jessup, Ian Karlin, ...
Field-Programmable Gate Arrays (FPGAs) are being employed in high performance computing systems owing to their potential to accelerate a wide variety of long-running routines. Par...
Uday Bondhugula, Ananth Devulapalli, James Dinan, ...
Scientific, symbolic, and multimedia applications present diverse computing workloads with different types of inherent parallelism. Tomorrow’s processors will employ varying com...
Linda M. Wills, Tarek M. Taha, Lewis B. Baumstark ...
With the ever-increasing numbers of cores per node on HPC systems, applications are increasingly using threads to exploit the shared memory within a node, combined with MPI across ...