The problem of cooperatively performing a set of t tasks in a decentralized setting where the computing medium is subject to failures is one of the fundamental problems in distrib...
Chryssis Georgiou, Alexander Russell, Alexander A....
A transient hardware fault occurs when an energetic particle strikes a transistor, causing it to change state. These faults do not cause permanent damage, but may result in incorr...
David Walker, Lester W. Mackey, Jay Ligatti, Georg...
Debugging real systems is hard, requires deep knowledge of the code, and is time-consuming. Bug reports rarely provide sufficient information, thus forcing developers to turn int...
Many stateful services use the replicated state machine approach for high availability. In this approach, a service runs on multiple machines to survive machine failures. This pap...
Jacob R. Lorch, Atul Adya, William J. Bolosky, Ron...
This paper describes the design, implementation, and evaluation of a replication scheme to handle Byzantine faults in transaction processing database systems. The scheme compares ...
Ben Vandiver, Hari Balakrishnan, Barbara Liskov, S...