In this paper we show cache-friendly implementations of the Floyd-Warshall algorithm for the All-Pairs ShortestPath problem. We first compare the best commercial compiler optimiza...
Abstract. By simulating a real computer it is possible to gain a detailed knowledge of the cache memory utilization of an application, e.g., a partial differential equation (PDE) s...
Retiming is a widely investigated technique for performance optimization. It performs powerful modifications on a circuit netlist. However, often it is not clear, whether the pred...
Achieving design closure is one of the biggest headaches for modern VLSI designers. This problem is exacerbated by high-level design automation tools that ignore increasingly impo...
Zhenyu (Peter) Gu, Jia Wang, Robert P. Dick, Hai Z...
The performance of most embedded systems is critically dependent on the average memory access latency. Improving the cache hit rate can have significant positive impact on the per...