This paper describes a new approach to finding performance bottlenecks in shared-memory parallel programs and its embodiment in the Paradyn Parallel Performance Tools running with...
The Imagine Stream Processor is a single-chip programmable media processor with 48 parallel ALUs. At 400 MHz, this translates to a peak arithmetic rate of 16 GFLOPS on single-prec...
Ujval J. Kapasi, William J. Dally, Scott Rixner, J...
4 Automatic differentiation is the primary means of obtaining analytic5 derivatives from a numerical model given as a computer program. There-6 fore, it is an essential productivi...
Porting on grids complex MPI applications involving collective communications requires significant program modification, usually dedicated to a single grid structure. The diffi...
The emergence of heterogeneous many core architectures presents a unique opportunity for delivering order of magnitude performance increases to high performance applications by ma...