Sciweavers

3373 search results - page 435 / 675
» Malleable applications for scalable high performance computi...
Sort
View
ASPLOS
1998
ACM
15 years 11 months ago
Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine
Advances in VLSI technology will enable chips with over a billion transistors within the next decade. Unfortunately, the centralized-resource architectures of modern microprocesso...
Walter Lee, Rajeev Barua, Matthew Frank, Devabhakt...
SC
2009
ACM
16 years 1 months ago
Automating the generation of composed linear algebra kernels
Memory bandwidth limits the performance of important kernels in many scientific applications. Such applications often use sequences of Basic Linear Algebra Subprograms (BLAS), an...
Geoffrey Belter, Elizabeth R. Jessup, Ian Karlin, ...
FCCM
2006
IEEE
107views VLSI» more  FCCM 2006»
16 years 22 days ago
Hardware/Software Integration for FPGA-based All-Pairs Shortest-Paths
Field-Programmable Gate Arrays (FPGAs) are being employed in high performance computing systems owing to their potential to accelerate a wide variety of long-running routines. Par...
Uday Bondhugula, Ananth Devulapalli, James Dinan, ...
WCRE
2002
IEEE
15 years 11 months ago
Estimating Potential Parallelism for Platform Retargeting
Scientific, symbolic, and multimedia applications present diverse computing workloads with different types of inherent parallelism. Tomorrow’s processors will employ varying com...
Linda M. Wills, Tarek M. Taha, Lewis B. Baumstark ...
PVM
2010
Springer
15 years 5 months ago
Enabling Concurrent Multithreaded MPI Communication on Multicore Petascale Systems
With the ever-increasing numbers of cores per node on HPC systems, applications are increasingly using threads to exploit the shared memory within a node, combined with MPI across ...
Gábor Dózsa, Sameer Kumar, Pavan Bal...