This paper presents an extensive characterization, tuning, and optimization of parallel I/O on the Cray XT supercomputer, named Jaguar, at Oak Ridge National Laboratory. We have c...
Empirical performance evaluation of parallel systems and applications can generate significant amounts of performance data and analysis results from multiple experiments as perfo...
Kevin A. Huck, Allen D. Malony, Robert Bell, Alan ...
Buffered CoScheduled (BCS) MPI is a novel implementation of MPI based on global synchronization of all system activities. BCS-MPI imposes a model where all processes and their com...
Loop fusion improves data locality and reduces synchronization in data-parallel applications. However, loop fusion is not always legal. Even when legal, fusion may introduce loop-...
This paper deals with the use of parallel processing for multi-objective optimization in applications in which the objective functions, the restrictions, and hence also the soluti...