Sciweavers

4213 search results - page 401 / 843
» The Tau Parallel Performance System
Sort
View
ISPASS
2009
IEEE
16 years 1 months ago
Analyzing CUDA workloads using a detailed GPU simulator
Modern Graphic Processing Units (GPUs) provide sufficiently flexible programming models that understanding their performance can provide insight in designing tomorrow’s manyco...
Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, He...
ISCA
2010
IEEE
176views Hardware» more  ISCA 2010»
15 years 11 months ago
Forwardflow: a scalable core for power-constrained CMPs
Chip Multiprocessors (CMPs) are now commodity hardware, but commoditization of parallel software remains elusive. In the near term, the current trend of increased coreper-socket c...
Dan Gibson, David A. Wood
EUROPAR
2010
Springer
15 years 8 months ago
Optimized Dense Matrix Multiplication on a Many-Core Architecture
Abstract. Traditional parallel programming methodologies for improving performance assume cache-based parallel systems. However, new architectures, like the IBM Cyclops-64 (C64), b...
Elkin Garcia, Ioannis E. Venetis, Rishi Khan, Guan...
HPDC
2005
IEEE
16 years 13 days ago
411 on scalable password service
In this paper we present 411, a password distribution system for high performance environments that provides security and scalability. We show that existing solutions such as NIS ...
Federico D. Sacerdoti, Mason J. Katz, Philip M. Pa...
IPPS
2002
IEEE
15 years 11 months ago
Cluster Load Balancing for Fine-Grain Network Services
This paper studies cluster load balancing policies and system support for fine-grain network services. Load balancing on a cluster of machines has been studied extensively in the...
Kai Shen, Tao Yang, Lingkun Chu