We present S, the first system to provide transparent, lowoverhead application record-replay and the ability to go live from replayed execution. S i...
The term “Self-healing” denotes the capability of a software system in dealing with bugs. Fault tolerance for dependable computing is to provide the specified service through ...
Reliable storage of data with concurrent read/write accesses (or query/update) is an ever recurring issue in distributed settings. In mobile ad hoc networks, the problem becomes e...
Two schemes proposed to cope with unrecoverable or latent media errors and enhance the reliability of RAID systems are examined. The first scheme is the established, widely used d...
Ilias Iliadis, Robert Haas, Xiao-Yu Hu, Evangelos ...
This paper presents an approach to system-level optimization of error detection implementation in the context of fault-tolerant realtime distributed embedded systems used for safe...
Adrian Lifa, Petru Eles, Zebo Peng, Viacheslav Izo...