As the industry moves toward larger-scale chip multiprocessors, the need to parallelize applications grows. High inter-thread communication delays, exacerbated by over-stressed hi...
Ram Rangan, Neil Vachharajani, Adam Stoler, Guilhe...
A blossoming paradigm for block-recursive matrix algorithms is presented that, at once, attains excellent performance measured by • time, • TLB misses, • L1 misses, • L2 m...
With the ability to place large numbers of transistors on a single silicon chip, manufacturers have begun developing chip multiprocessors (CMPs) containing multiple processor core...
This paper is on the construction of a fault-tolerant and responsive server subsystem in an application context where the subsystem is accessed through an asynchronous network by ...
Commodity hardware and software are growing increasingly more complex, with advances such as chip heterogeneity and specialization, deeper memory hierarchies, ne-grained power ma...