Store misses cause significant delays in shared-memory multiprocessors because of limited store buffering and ordering constraints required for proper synchronization. Today, prog...
Thomas F. Wenisch, Anastassia Ailamaki, Babak Fals...
There has been considerable recent interest in the support of transactional memory (TM) in both hardware and software. We present an intermediate approach, in which hardware is us...
Arrvindh Shriraman, Michael F. Spear, Hemayet Hoss...
The commonly used LRU replacement policy is susceptible to thrashing for memory-intensive workloads that have a working set greater than the available cache size. For such applica...
Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, ...
The ability to perform long, accurate molecular dynamics (MD) simulations involving proteins and other biological macromolecules could in principle provide answers to some of the ...
David E. Shaw, Martin M. Deneroff, Ron O. Dror, Je...
We have studied DRAM-level prefetching for the fully buffered DIMM (FB-DIMM) designed for multi-core processors. FB-DIMM has a unique two-level interconnect structure, with FB-DIM...