Abstract. As processor complexity increases compilers tend to deliver suboptimal performance. Library generators such as ATLAS, FFTW and SPIRAL overcome this issue by empirically s...
We present design and implementation details as well as performance results for two new parallel checkpointing libraries developed by us for parallel MPI applications. The first o...
We present an algorithm for implementing byte-range locks using MPI passive-target one-sided communication. This algorithm is useful in any scenario in which multiple processes of ...
Irregular applications frequently exhibit poor performance on contemporary computer architectures, in large part because of their inefficient use of the memory hierarchy. Runtime ...
Abstract We propose a novel approach to preserve the synchronizing sequences of a circuit after retiming. The significance of this problem stems from the necessity of maintaining c...
Maher N. Mneimneh, Karem A. Sakallah, John Moondan...