The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for bandwidth. The size of the L1 data cache did not scale over the past dec...
: Several solutions have been developed to provide dataintensive applications with the highest possible data rates. Such solutions tried to utilize the available network resources ...
—The problem of automatically generating hardware modules from a high level representation of an application has been at the research forefront in the last few years. In this pap...
New media and signal processing applications demand ever higher performance while operating within the tight power constraints of mobile devices. A range of hardware implementatio...
Kevin Fan, Manjunath Kudlur, Ganesh S. Dasika, Sco...
Transactional memory (TM) provides an easy-to-use and high-performance parallel programming model for the upcoming chip-multiprocessor systems. Several researchers have proposed a...
JaeWoong Chung, Hassan Chafi, Chi Cao Minh, Austen...