Originally developed to connect processors and memories in multicomputers, prior research and design of interconnection networks have focused largely on performance. As these netw...
Predicated Execution can be used to alleviate the costs associated with frequently mispredicted branches. This is accomplished by trading the cost of a mispredicted branch for exe...
This paper explores the role of branch predictor organization in power/energy/performance tradeoffs for processor design. We find that as a general rule, to reduce overall energy ...
Dharmesh Parikh, Kevin Skadron, Yan Zhang, Marco B...
The increasing performance gap between processors and memory will force future architectures to devote significant resources towards removing and hiding memory latency. The two ma...
This paper advocates that cache coherence protocols use a bandwidth adaptive approach to adjust to varied system configurations (e.g., number of processors) and workload behaviors...
Milo M. K. Martin, Daniel J. Sorin, Mark D. Hill, ...