Designing a distributed fault tolerance algorithm requires careful analysis of both fault models and diagnosis strategies. A system will fail if there are too many active faults, ...
Our goal is to automatically obtain a distributed and fault-tolerant embedded system: distributed because the system must run on a distributed architecture; fault-tolerant because...
This paper describes the design, implementation, and performance evaluation of ST-TCP (Server fault-Tolerant TCP), which is an extension of TCP to tolerate TCP server failures. Th...
We consider a fault tolerant version of the metric facility location problem in which every city, j, is required to be connected to rj facilities. We give the first non-trivial ap...
To date, there is little evidence that modular reasoning about fault-tolerant systems can simplify the verification process in practice. We study this question using a prominent e...