This paper presents a helper thread prefetching scheme that is designed to work on loosely-coupled processors, such as in a standard chip multi-processor (CMP) system and in an in...
Changhee Jung, Daeseob Lim, Jaejin Lee, Yan Solihi...
The L2 cache is commonly managed using LRU policy. For workloads that have a working set larger than L2 cache, LRU behaves poorly, resulting in a great number of less reused lines...
While traditional approaches to code profiling help locate performance bottlenecks, they offer only limited support for removing these bottlenecks. The main reason is the lack of v...
We describe a family of reconfigurable parallel architectures for logic emulation. They are supposed to be applicable like conventional FPGAs, while covering a larger range of circ...
Tiny embedded systems have not been an ideal outfit for high performance computing due to their constrained resources. Limitations in processing power, battery life, communication ...