We present a cache locality optimization technique that can optimize a loop nest even if the arrays referenced have different layouts in memory. Such a capability is required for a...
Mahmut T. Kandemir, J. Ramanujam, Alok N. Choudhar...
Although SIMD (Single Instruction stream Multiple Data stream) parallel computers have existed for decades, it is only in the past few years that a new version of SIMD has evolved...
Techniques for aggressive optimization and parallelization of applications can have the side-effect of introducing copy instructions, register-to-register move instructions, into t...
There is an emerging class of real-time interactive applications that require the dynamic integration of task and data parallelism. An example is the Smart Kiosk, a free-standing ...
James M. Rehg, Kathleen Knobe, Umakishore Ramachan...
Abstract. Previously published methods for estimation of the worstcase execution time on contemporary processors with complex pipelines and multi-level memory hierarchies result in...