In this paper, we present a methodology for profiling parallel applications executing on the IBM PowerXCell 8i (commonly referred to as the “Cell” processor). Specifically, we...
Hikmet Dursun, Kevin J. Barker, Darren J. Kerbyson...
DTA (Decoupled Threaded Architecture) is designed to exploit fine/medium grained Thread Level Parallelism (TLP) by using a distributed hardware scheduling unit and relying on exi...
Porting on grids complex MPI applications involving collective communications requires significant program modification, usually dedicated to a single grid structure. The diffi...
Evaluation of high performance parallel systems is a delicate issue, due to the difficulty of generating workloads that represent, with fidelity, those that will run on actual sys...
We present a load-balancing technique that exploits the temporal coherence, among successive computation phases, in mesh-like computations to be mapped on a cluster of processors....
Biagio Cosenza, Gennaro Cordasco, Rosario De Chiar...