Accelerated graphics cards, or Graphics Processing Units (GPUs), have become ubiquitous in recent years. On the right kinds of problems, GPUs greatly surpass CPUs in terms of raw ...
This paper presents a dynamic task scheduling approach to executing dense linear algebra algorithms on multicore systems (either shared-memory or distributed-memory). We use a tas...
Abstract. We present a realizability model for a call-by-value, higherorder programming language with parametric polymorphism, general first-class references, and recursive types....
Tiling is a crucial loop transformation for generating high performance code on modern architectures. Efficient generation of multilevel tiled code is essential for maximizing da...
Albert Hartono, Muthu Manikandan Baskaran, C&eacut...
Embedded systems, like general-purpose systems, can benefit from parallel execution on a symmetric multicore platform. Unfortunately, concurrency issues present in general-purpos...