Sciweavers

4213 search results - page 424 / 843
» The Tau Parallel Performance System
Sort
View
206
Voted
ICPP
1998
IEEE
15 years 11 months ago
A memory-layout oriented run-time technique for locality optimization
Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout ...
Yong Yan, Xiaodong Zhang, Zhao Zhang
189
Voted
IPPS
1996
IEEE
15 years 11 months ago
ECO: Efficient Collective Operations for Communication on Heterogeneous Networks
PVM and other distributed computing systems have enabled the use of networks of workstations for parallel computation, but their approach of treating all networks as collections o...
Bruce Lowekamp, Adam Beguelin
170
Voted
ICPADS
2005
IEEE
16 years 14 days ago
I/O Processor Allocation for Mesh Cluster Computers
As cluster systems become increasingly popular, more and more parallel applications require need not only computing power but also significant I/O performance. However, the I/O s...
Pangfeng Liu, Chun-Chen Hsu, Jan-Jan Wu
197
Voted
ICS
2004
Tsinghua U.
16 years 7 days ago
Evaluating support for global address space languages on the Cray X1
The Cray X1 was recently introduced as the first in a new line of parallel systems to combine high-bandwidth vector processing with an MPP system architecture. Alongside capabili...
Christian Bell, Wei-Yu Chen, Dan Bonachea, Katheri...
227
Voted
SASP
2009
IEEE
291views Hardware» more  SASP 2009»
16 years 1 months ago
FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs
— As growing power dissipation and thermal effects disrupted the rising clock frequency trend and threatened to annul Moore’s law, the computing industry has switched its route...
Alexandros Papakonstantinou, Karthik Gururaj, John...