Experimental studies have shown that electing a leader based on measurements of the underlying communication network can be beneficial. We use this approach to study the problem ...
Software configuration problems are a major source of failures in computer systems. In this paper, we present a new framework for categorizing configuration problems. We apply thi...
Archana Ganapathi, Yi-Min Wang, Ni Lao, Ji-Rong We...
Designing storage systems to provide business continuity in the face of failures requires the use of various data protection techniques, such as backup, remote mirroring, point-in-...
We present the concept of alternative functionality for improving dependability in distributed embedded systems. Alternative functionality is a mechanism that complements traditio...
The combination of continued technology scaling and increased on-chip transistor densities has made vulnerability to radiation induced soft errors a significant design concern. In...