Sciweavers

70 search results - page 6 / 14
» Programming for parallelism and locality with hierarchically...
Sort
View
ICPP
1993
IEEE
15 years 10 months ago
Automatic Parallelization Techniques for the EM-4
: This paper presents a Data-Distributed Execution approach that exploits interation-level parallelism in loops operating over arrays. It performs data-dependency analysis, based o...
Lubomir Bic, Mayez A. Al-Mouhamed
EUROPAR
2006
Springer
15 years 9 months ago
A Hierarchical CLH Queue Lock
Abstract. Modern multiprocessor architectures such as CC-NUMA machines or CMPs have nonuniform communication architectures that render programs sensitive to memory access locality....
Victor Luchangco, Daniel Nussbaum, Nir Shavit
ICPP
1998
IEEE
15 years 10 months ago
A memory-layout oriented run-time technique for locality optimization
Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout ...
Yong Yan, Xiaodong Zhang, Zhao Zhang
ICS
2003
Tsinghua U.
15 years 11 months ago
Estimating cache misses and locality using stack distances
Cache behavior modeling is an important part of modern optimizing compilers. In this paper we present a method to estimate the number of cache misses, at compile time, using a mac...
Calin Cascaval, David A. Padua
CLUSTER
2004
IEEE
15 years 9 months ago
Predicting memory-access cost based on data-access patterns
Improving memory performance at software level is more effective in reducing the rapidly expanding gap between processor and memory performance. Loop transformations (e.g. loop un...
Surendra Byna, Xian-He Sun, William Gropp, Rajeev ...