Sciweavers

3660 search results - page 366 / 732
» Parallel Program Archetypes
Sort
View
ICS
2010
Tsinghua U.
15 years 5 months ago
An approach to resource-aware co-scheduling for CMPs
We develop real-time scheduling techniques for improving performance and energy for multiprogrammed workloads that scale nonuniformly with increasing thread counts. Multithreaded ...
Major Bhadauria, Sally A. McKee
SASP
2009
IEEE
291views Hardware» more  SASP 2009»
16 years 1 months ago
FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs
— As growing power dissipation and thermal effects disrupted the rising clock frequency trend and threatened to annul Moore’s law, the computing industry has switched its route...
Alexandros Papakonstantinou, Karthik Gururaj, John...
IPPS
2010
IEEE
15 years 4 months ago
Speculative execution on multi-GPU systems
Abstract--The lag of parallel programming models and languages behind the advance of heterogeneous many-core processors has left a gap between the computational capability of moder...
Gregory F. Diamos, Sudhakar Yalamanchili
IPPS
2009
IEEE
16 years 1 months ago
Minimizing startup costs for performance-critical threading
—Using the well-known ATLAS and LAPACK dense linear algebra libraries, we demonstrate that the parallel management overhead (PMO) can grow with problem size on even statically sc...
Anthony M. Castaldo, R. Clint Whaley
APPT
2009
Springer
16 years 1 months ago
Performance Improvement of Multimedia Kernels by Alleviating Overhead Instructions on SIMD Devices
SIMD extension is one of the most common and effective technique to exploit data-level parallelism in today’s processor designs. However, the performance of SIMD architectures i...
Asadollah Shahbahrami, Ben H. H. Juurlink