Multicore architectures, which have multiple processing units on a single chip, are widely viewed as a way to achieve higher processor performance, given that thermal and power pr...
James H. Anderson, John M. Calandrino, UmaMaheswar...
While previous CPU- or memory-centric load balancing schemes are capable of achieving the effective usage of global CPU and memory resources in a cluster system, the cluster exhib...
Xiao Qin, Hong Jiang, Yifeng Zhu, David R. Swanson
Parallel programmers typically assume that all resources required for a program’s execution are dedicated to that purpose. However, in local and wide area networks, contention f...
Alain J. Roy, Ian T. Foster, William Gropp, Nichol...
Abstract. This paper presents a study of performance optimization of dense matrix multiplication on IBM Cyclops-64(C64) chip architecture. Although much has been published on how t...
Ziang Hu, Juan del Cuvillo, Weirong Zhu, Guang R. ...
Cooperative caching, which allows sharing and coordination of cached data among clients, is a potential technique to improve the data access performance and availability in mobile ...