Iterative stencil loops (ISLs) are used in many applications and tiling is a well-known technique to localize their computation. When ISLs are tiled across a parallel architecture...
In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to...
In modern digital ICs, the increasing demand for performance and throughput requires operating frequencies of hundreds of megahertz, and in several cases exceeding the gigahertz r...
In data centers hosting scaling Internet applications, operators face the tradeoff dilemma between resource efficiency and Quality of Service (QoS), and the root cause lies in wo...
—Shared cache allocation policies play an important role in determining CMP performance. The simplest policy, LRU, allocates cache implicitly as a consequence of its replacement ...