Exploiting speculative thread-level parallelism across modules, e.g., methods, procedures, or functions, have shown promise. However, misspeculations and task creation overhead ar...
The full-custom CMOS realization of a new modular sorting architecture is presented. The high-performance architecture is based on rank ordering, and on efficient implementation o...
For multi-gigahertz designs in nanometer technologies, data transfers on global interconnects take multiple clock cycles. In this paper, we propose a regular distributed register ...
The number of pipeline stages separating dynamic instruction scheduling from instruction execution has increased considerably in recent out-of-order microprocessor implementations...
In this paper we present an approach to boosting performance and tolerating latency by deferring non-critical instructions into a deferred queue for later processing. As such, ins...
Gregory A. Muthler, David Crowe, Sanjay J. Patel, ...