Non-uniform utilization of functional units in combination with hardware mechanisms such as clock gating leads to different power consumptions in different parts of a processor ch...
Abstract— Recent research has shown that forwarding speculative data to other processors before it is requested can improve the performance of multiprocessor systems. The most re...
Existing schemes for cache energy optimization incorporate a limited degree of dynamic associativity: either direct mapped or full available associativity (say 4-way). In this pap...
Abstract—On-chip power density and temperature are rising exponentially with decreasing feature sizes. This alarming trend calls for temperature management at every level of syst...
Data-intensive applications are increasingly designed to execute on large computing clusters. Grouped aggregation is a core primitive of many distributed programming models, and i...