Future computing workloads will emphasize an architecture's ability to perform relatively simple calculations on massive quantities of mixed-width data. This paper describes ...
Seth Copen Goldstein, Herman Schmit, Matthew Moe, ...
In this paper, we present techniques and algorithms to improve the performance of various communication patterns on message-passing platforms where, for reasons of safety, user-le...
Current microprocessors incorporate techniques to exploit instruction-level parallelism (ILP). However, previous work has shown that these ILP techniques are less effective in rem...
Abstract. The High Performance Fortran (HPF) benchmark suite HPFBench was designed for evaluating the HPF language and compilers on scalable architectures. The functionality of the...
Run-time parallelization is often the only way to execute the code in parallel when data dependence information is incomplete at compile time. This situation is common in many imp...