This paper presents an automated performance tuning solution, which partitions a program into a number of tuning sections and finds the best combination of compiler options for e...
Many large scale applications, have significant I/O requirements as well as computational and memory requirements. Unfortunately, limited number of I/O nodes provided by the conte...
Meenakshi A. Kandaswamy, Mahmut T. Kandemir, Alok ...
In this work we present a predictive analytical model that encompasses the performance and scaling characteristics of a nondeterministic particle transport application, MCNP (Mont...
Much research has been done in fast communication on clusters and in protocols for supporting software shared memory across them. However, the end performance of applications that...
Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to addres...
Mark Horowitz, Margaret Martonosi, Todd C. Mowry, ...