Double precision floating-point arithmetic is inadequate for many scientific computations. This paper presents the design of a quadruple precision floating-point multiplier tha...
The application of hardware-parameterized models to distributed systems can result in omission of key bottlenecks such as the full cost of inter-node communication in a shared mem...
We focus on an important problem in the space of ubiquitous computing, namely, programming support for the distributed heterogeneous computing elements that make up this environme...
Efficient load balancing algorithms are the key to many efficient parallel applications. Until now, research in this area has mainly been focusing on homogeneous schemes. Howeve...
This paper presents the design and implementation of an asynchronous data-staging strategy for file accesses based on ROMIO, the most popular MPI-IO distribution, and ZeptoOS, an ...