The current technologies have made it possible to execute parallel applications across heterogeneous platforms. However, the performance models available do not provide adequate m...
Jameela Al-Jaroodi, Nader Mohamed, Hong Jiang, Dav...
In this paper, we describe a compilation system that automates much of the process of performance tuning that is currently done manually by application programmers interested in h...
Nastaran Baradaran, Jacqueline Chame, Chun Chen, P...
Abstract. In this work an architecture of an automatically tuned linear algebra library proposed in previous works is extended in order to adapt it to platforms where both the CPU ...
Internet coordinate schemes have been proposed as a method for estimating minimum round trip time between hosts without direct measurement. In such a scheme, each host is assigned...
We present the first lock-free implementation of an extensible hash table running on current architectures. Our algorithm provides concurrent insert, delete, and find operations ...