Wide-area parallel processing systems will soon be available to researchers to solve a range of problems. In these systems, it is certain that host failures and other faults will ...
Petascale HPC systems are among the largest systems in the world. Intrepid, one such system, is a 40,000 node, 556 teraflop Blue Gene/P system that has been deployed at Argonne Na...
Narayan Desai, Rick Bradshaw, Cory Lueninghoener, ...
A block-level continuous data protection (CDP) system logs every disk block update from an application server (e.g., a file or DBMS server) to a storage system so that any disk u...
We describe our experiences with the Chubby lock service, which is intended to provide coarse-grained locking as well as reliable (though low-volume) storage for a loosely-coupled...
Safety-critical systems typically operate in unpredictable environments. Requirements for safety and reliability are in conflict with those for real-time responsiveness. Due to un...