Current parallelizing compilers for message-passing machines only support a limited class of data-parallel applications. One method for eliminating this restriction is to combine ...
Abstract. Branch Prediction is a common function in nowadays microprocessor. Branch predictor is duplicated into multiple copies in each core of a multicore and many-core processor...
We describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow comput...
We present a core calculus with two of X10's key constructs for parallelism, namely async and finish. Our calculus forms a convenient basis for type systems and static analys...
This paper describes the design and the implementation of parallel routines in the Heterogeneous ScaLAPACK library that solve a dense system of linear equations. This library is w...
Ravi Reddy Manumachu, Alexey L. Lastovetsky, Pedro...