We present a transparent, system-level checkpointing solution for master-worker parallelism that automatically adapts, upon restart, to the number of processor nodes available. Th...
Modern computing environments, such as enterprise data centers, Grids, and PlanetLab, introduce distributed services to address scalability, locality, and reliability. Web Service...
Robert Adams, Paul Brett, Subu Iyer, Dejan S. Milo...
Key issues to address in autonomic job recovery for cluster computing are recognizing job failure; understanding the failure sufficiently to know if and how to restart the job; an...
Charles Earl, Emilio Remolina, Jim Ong, John Brown
In this paper, we present a structure for monitoring a large set of computational clusters. We illustrate methods for scaling a monitor network comprised of many clusters while ke...
Federico D. Sacerdoti, Mason J. Katz, Matthew L. M...
We present an approach for rendering the surface of particle-based fluids that is simple to implement, has real-time performance with a configurable speed/quality trade-off, and...
Wladimir J. van der Laan, Simon Green, Miguel Sain...