During its lifetime, embedded systems go through multiple changes to their runtime architecture. That is, threads, processes, and processor are added or removed to/from the softwa...
We introduce a reliable memory system that can tolerate multiple transient errors in the memory words as well as transient errors in the encoder and decoder (corrector) circuitry....
Abstract. In this paper, we mix two well-known approaches of the fault-tolerance: robustness and stabilization. Robustness is the aptitude of an algorithm to withstand permanent fa...
We present the design of a distributed store that offers various levels of security guarantees while tolerating a limited number of nodes that are compromised by an adversary. The...
Subramanian Lakshmanan, Mustaque Ahamad, H. Venkat...
Because of increasing hardware and software complexity, the running time of many computational science applications is now more than the mean-time-to-failure of highpeformance com...
Greg Bronevetsky, Daniel Marques, Keshav Pingali, ...