Many parallel applications from scientific computing use MPI global communication operations to collect or distribute data. Since the execution times of these communication opera...
Although distributed object systems, for example RMI and CORBA, enable object-oriented programs to be easily distributed across a network, achieving acceptable performance usually...
Identifying and inferring performances of a network topology is a well known problem. Achieving this by using only end-to-end measurements at the application level is a method kno...
Future parallel computers must efficiently execute not only hand-coded applications but also programs written in high-level, parallel programming languages. Today's machines ...
Steven K. Reinhardt, James R. Larus, David A. Wood
A stream processor executes an application that has been decomposed into a sequence of kernels that operate on streams of data elements. During the execution of a kernel, all stre...
Xuejun Yang, Li Wang, Jingling Xue, Yu Deng, Ying ...