Memory bandwidth limits the performance of important kernels in many scientific applications. Such applications often use sequences of Basic Linear Algebra Subprograms (BLAS), an...
Geoffrey Belter, Elizabeth R. Jessup, Ian Karlin, ...
Abstract—Software Distributed Shared Memory (SDSM) systems provide programmers with a shared memory programming environment across distributed memory architectures. In contrast t...
— Cell Superscalar (CellSs) provides a simple, flexible and easy programming approach for the Cell Broadband Engine (Cell/B.E.) that automatically exploits the inherent concurre...
—Analytical models have been used to estimate optimal values for parameters such as tile sizes in the context of loop nests. However, important algorithms such as fast Fourier tr...
Basilio B. Fraguela, Yevgen Voronenko, Markus P&uu...
The MPI datatype functionality provides a powerful tool for describing structured memory and file regions in parallel applications, enabling noncontiguous data to be operated on b...
Robert B. Ross, Robert Latham, William Gropp, Ewin...