Much research has been done in fast communication on clusters and in protocols for supporting software shared memory across them. However, the end performance of applications that...
The vector-clock size necessary to characterize causality in a distributed computation is bounded by the dimension of the partial order induced by that computation. In an arbitrar...
Streamlining communication is key to achieving good performance in shared-memory parallel programs. While full hardware support for cache coherence generally offers the best perfo...
The trace cache is a recently proposed solution to achieving high instruction fetch bandwidth by buffering and reusing dynamic instruction traces. This work presents a new block-b...
For types of data visualization where the cost of producing images is high, and the relationship between the rendering parameters and the image produced is less than obvious, a vi...