This paper proposes a parallel cycle-accurate microarchitectural simulator which efficiently executes its workload by splitting the simulation process along time-axis into many in...
Tiling is a crucial loop transformation for generating high performance code on modern architectures. Efficient generation of multilevel tiled code is essential for maximizing da...
Albert Hartono, Muthu Manikandan Baskaran, C&eacut...
As multi-core microprocessors are becoming widely adopted, the need to extract thread-level parallelism (TLP) from single-threaded applications in a seamless fashion increases. In...
Md. Mafijul Islam, Alexander Busck, Mikael Engbom,...
Abstract— NnSP is a stream-based programmable and codelevel statically reconfigurable processor for realization of neural networks in embedded systems. NnSP is provided with a n...
Hadi Esmaeilzadeh, Pooya Saeedi, Babak Nadjar Araa...
Erlang is a concurrent functional language designed for developing large-scale, distributed, fault-tolerant systems. The primary implementation of the language is the Erlang/OTP s...
Daniel Luna, Mikael Pettersson, Konstantinos F. Sa...