Elasticity of cloud computing environments provides an economic incentive for automatic resource allocation of stateful systems running in the cloud. However, these systems have t...
Recent research has shown that even modern hard disks have complex failure modes that do not conform to “failstop” operation. Disks exhibit partial failures like block access ...
Lakshmi N. Bairavasundaram, Andrea C. Arpaci-Dusse...
Production systems are commonly plagued by intermittent problems that are difficult to diagnose. This paper describes a new diagnostic tool, called Chopstix, that continuously col...
Sapan Bhatia, Abhishek Kumar, Marc E. Fiuczynski, ...
We discuss a generic architecture for building user-centric systems. The characteristic feature of such systems is a control loop that monitors the user’s state, and produces a h...
Gilbert Beyer, Moritz Hammer, Christian Kroiss, An...
Missing or faulty exception handling has caused a number of spectacular system failures and is a major cause of software failures in extensively tested critical systems. Prior wor...