This paper investigates the design of parallel algorithmic strategies that address the efficient use of both, memory hierarchies within each processor and a multilevel clustered ...
Frank K. H. A. Dehne, Stefano Mardegan, Andrea Pie...
This paper describes how a portable benchmark suite that measures the ability of an MPI implementation to overlap computation and communication can be used to discover and diagnos...
Ron Brightwell, William Lawry, Arthur B. Maccabe, ...
In today’s high-performance computational environments communication substrates often stand out as the major limiting factor to performance, accessibility, and stability. Such p...
Programming heterogeneous parallel computer systems is notoriously difficult, but MIMD models have proven to be portable across multi-core processors, clusters, and massively paral...
-In this paper, we present an integration of SCI (Scalable Coherent Interface[5]) into the TCP/IP protocol stack of Linux for high bandwidth low latency communication within two or...
Ralf Grosse Borger, Roger Butenuth, Hans-Ulrich He...