Sciweavers

4198 search results - page 189 / 840
» Data Parallel Program Design
Sort
View
ICPP
2008
IEEE
16 years 1 months ago
Taming Single-Thread Program Performance on Many Distributed On-Chip L2 Caches
This paper presents a two-part study on managing distributed NUCA (Non-Uniform Cache Architecture) L2 caches in a future manycore processor to obtain high singlethread program per...
Lei Jin, Sangyeun Cho
IFL
2004
Springer
131views Formal Methods» more  IFL 2004»
15 years 12 months ago
Exploiting Single-Assignment Properties to Optimize Message-Passing Programs by Code Transformations
The message-passing paradigm is now widely accepted and used mainly for inter-process communication in distributed memory parallel systems. However, one of its disadvantages is the...
Alfredo Cristóbal-Salas, Andrey Chernykh, E...
HICSS
1995
IEEE
128views Biometrics» more  HICSS 1995»
15 years 10 months ago
Instruction Level Parallelism
Abstract. We reexamine the limits of parallelism available in programs, using runtime reconstruction of program data-flow graphs. While limits of parallelism have been examined in...
LCPC
2000
Springer
15 years 10 months ago
Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP
This paper proposes a simple and efficient implementation method for a hierarchical coarse grain task parallel processing scheme on a SMP machine. OSCAR multigrain parallelizing c...
Hironori Kasahara, Motoki Obata, Kazuhisa Ishizaka
IPPS
2010
IEEE
15 years 4 months ago
Out-of-core distribution sort in the FG programming environment
We describe the implementation of an out-of-core, distribution-based sorting program on a cluster using FG, a multithreaded programming framework. FG mitigates latency from disk-I/...
Priya Natarajan, Thomas H. Cormen, Elena Riccio St...