The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for bandwidth. The size of the L1 data cache did not scale over the past de...
Functional validation of a processor design through execution of a suite of test programs is common industrial practice. In this paper, we develop a high-level architectural speci...
Thanh Nga Dang, Abhik Roychoudhury, Tulika Mitra, ...
This paper introduces Way Stealing, a simple architectural modification to a cache-based processor to increase data bandwidth to and from application-specific Instruction Set Exte...
Theo Kluter, Philip Brisk, Paolo Ienne, Edoardo Ch...
In recent years the computing power of graphics cards has increased significantly. Indeed, the growth in the computing power of these graphics cards is now several orders of magn...
Retargetable C compilers are nowadays widely used to quickly obtain compiler support for new embedded processors and to perform early processor architecture exploration. One frequ...
Manuel Hohenauer, Christoph Schumacher, Rainer Leu...