The importance of fault tolerance at the processor architecture level has been made increasingly important due to rapid advancements in the design and usage of high performance de...
T. S. Ganesh, Viswanathan Subramanian, Arun K. Som...
—In this paper, we classify multitolerant systems, i.e., systems that tolerate multiple classes of faults and provide potentially different levels of tolerance to them in terms o...
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
This paper proposes a variation of the Byzantine generals problem (or Byzantine consensus). Each general has a set of good plans and a set of bad plans. The problem is to make all...
Miguel Correia, Alysson Neves Bessani, Paulo Ver&i...
With technology scaling, manufacture-time and in-field permanent faults are becoming a fundamental problem. Multi-core architectures with spares can tolerate them by detecting an...
Shuou Nomura, Matthew D. Sinclair, Chen-Han Ho, Ve...