Performance evaluation using only a subset of programs from a benchmark suite is commonplace in computer architecture research. This is especially true during early design space e...
Predicated execution has been used to reduce the number of branch mispredictions by eliminating hard-to-predict branches. However, the additional instruction overhead and addition...
Hyesoon Kim, Onur Mutlu, Jared Stark, Yale N. Patt
Various architectural power reduction techniques have been proposed for on-chip caches in the last decade. In this paper, we first show that these power reduction techniques can b...
Ja Chun Ku, Serkan Ozdemir, Gokhan Memik, Yehea I....
Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding. Associative search latency does not scale well to capacities and bandwidths re...
As transistors keep shrinking and on-chip data caches keep growing, static power dissipation due to leakage of caches takes an increasing fraction of total power in processors. Se...