Sciweavers

2400 search results - page 76 / 480
» Systems Failures
Sort
View
LADC
2011
Springer
14 years 9 months ago
Byzantine Fault-Tolerant Deferred Update Replication
Abstract—Replication is a well-established approach to increasing database availability. Many database replication protocols have been proposed for the crash-stop failure model, ...
Fernando Pedone, Nicolas Schiper, José Enri...
ICAC
2005
IEEE
15 years 11 months ago
Distributed Troubleshooting Agents
Key issues to address in autonomic job recovery for cluster computing are recognizing job failure; understanding the failure sufficiently to know if and how to restart the job; an...
Charles Earl, Emilio Remolina, Jim Ong, John Brown
DSOM
2004
Springer
15 years 11 months ago
ABHA: A Framework for Autonomic Job Recovery
Key issues to address in autonomic job recovery for cluster computing are recognizing job failure; understanding the failure sufficiently to know if and how to restart the job; an...
Charles Earl, Emilio Remolina, Jim Ong, John Brown...
PRDC
2007
IEEE
16 years 14 days ago
Implementation of a Flexible Membership Protocol on a Real-Time Ethernet Prototype
This paper describes the implementation of a processorgroup membership protocol in an experimental real-time network. The protocol is appropriate for fault-tolerant distributed sy...
Raul Barbosa, António Ferreira, Johan Karls...
OSDI
2004
ACM
16 years 6 months ago
Microreboot - A Technique for Cheap Recovery
A significant fraction of software failures in large-scale Internet systems are cured by rebooting, even when the exact failure causes are unknown. However, rebooting can be expen...
George Candea, Shinichi Kawamoto, Yuichi Fujiki, G...