To cope with the increasing difference between processor and main memory speeds, modern computer systems use deep memory hierarchies. In the presence of such hierarchies, the perf...
Margaret Martonosi, Anoop Gupta, Thomas E. Anderso...
Abstract. Most parallel systems on which MPI is used are now hierarchical: some processors are much closer to others in terms of interconnect performance. One of the most common su...
Hao Zhu, David Goodell, William Gropp, Rajeev Thak...
Granularity control is an effective means for trading power consumption with performance on dense shared memory multiprocessors, such as multi-SMT and multi-CMP systems. In this p...
Matthew Curtis-Maury, James Dzierwa, Christos D. A...
Abstract. As the disparity between processor and memory speed continues to widen, the exploitation of locality of reference in shared-memory multiprocessors becomes an increasingly...
Current processor allocation techniques for highly parallel systems are based on centralized front-end based algorithms. As a result, the applied strategies are restricted to stat...