As semiconductor feature sizes decrease, interconnect delay is becoming a dominant component of processor cycle times. This creates a critical need to shift microarchitectural des...
In this paper, we evaluate an adaptive loop parallelization strategy (i.e., a strategy that allows each loop nest to execute using different number of processors if doing so is be...
Ismail Kadayif, Mahmut T. Kandemir, Mustafa Karak&...
In this work we present a predictive analytical model that encompasses the performance and scaling characteristics of a nondeterministic particle transport application, MCNP (Mont...
This paper presents the design and implementation of the MPI-IO interface for the Clusterfile parallel file system. The approach offers the opportunity of achieving a high corelat...
Abstract: Existing multicore systems already provide deep levels of thread parallelism. Hybrid programming models and composability of parallel libraries are very active areas of r...
Costin Iancu, Steven Hofmeyr, Filip Blagojevic, Yi...