CMOS technology trends are leading to an increasing incidence of hard (permanent) faults in processors. These faults may be introduced at fabrication or occur in the field. Wherea...
Transient faults are emerging as a critical concern in the reliability of general-purpose microprocessors. As architectural trends point towards multi-threaded multi-core designs,...
Alex Shye, Tipp Moseley, Vijay Janapa Reddi, Josep...
We propose a new fault tolerance metric for XOR-based erasure codes: the minimal erasures list (MEL). A minimal erasure is a set of erasures that leads to irrecoverable data loss ...
- Partitionability allows the creation of many physically independent subsystems, each of which retains an identical functionality as its parent network and has no communication in...
Abstract. While injecting fault during training has long been demonstrated as an effective method to improve fault tolerance of a neural network, not much theoretical work has been...