The growing computational and storage needs of several scientific applications mandate the deployment of extreme-scale parallel machines, such as IBM’s Blue Gene/L which can acc...
This paper proposes efficient algorithms for implementing multicast in heterogeneous workstation/PC clusters. Multicast is an important operation in many scientific and industri...
In this paper, we present efficient methods for multidimensional array redistribution. Based on the previous work, the basic-cycle calculation technique, we present a basic-block ...
As supercomputers are being built from an ever increasing number of processing elements, the effort required to achieve a substantial fraction of the system peak performance is con...
During the last few years, GPUs have evolved from simple devices for the display signal preparation into powerful coprocessors that do not only support typical computer graphics t...