Sciweavers

15317 search results - page 2671 / 3064
» Globally Distributed Data
Sort
View
PDP
2009
IEEE
16 years 1 months ago
A Parallel Implementation of the 2D Wavelet Transform Using CUDA
There is a multicore platform that is currently concentrating an enormous attention due to its tremendous potential in terms of sustained performance: the NVIDIA Tesla boards. The...
Joaquín Franco, Gregorio Bernabé, Ju...
PDP
2009
IEEE
16 years 1 months ago
High Throughput Intra-Node MPI Communication with Open-MX
Abstract—The increasing number of cores per node in highperformance computing requires an efficient intra-node MPI communication subsystem. Most existing MPI implementations rel...
Brice Goglin
177
Voted
SC
2009
ACM
16 years 1 months ago
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems
This paper presents a dynamic task scheduling approach to executing dense linear algebra algorithms on multicore systems (either shared-memory or distributed-memory). We use a tas...
Fengguang Song, Asim YarKhan, Jack Dongarra
ICS
2009
Tsinghua U.
16 years 1 months ago
Computer generation of fast fourier transforms for the cell broadband engine
The Cell BE is a multicore processor with eight vector accelerators (called SPEs) that implement explicit cache management through direct memory access engines. While the Cell has...
Srinivas Chellappa, Franz Franchetti, Markus P&uum...
ICS
2009
Tsinghua U.
16 years 1 months ago
Parametric multi-level tiling of imperfectly nested loops
Tiling is a crucial loop transformation for generating high performance code on modern architectures. Efficient generation of multilevel tiled code is essential for maximizing da...
Albert Hartono, Muthu Manikandan Baskaran, C&eacut...
« Prev « First page 2671 / 3064 Last » Next »