Abstract. With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault toleran...
George Bosilca, Aurelien Bouteiller, Thomas H&eacu...
Multiprocessor SoCs are increasingly deployed in embedded systems with little or no security features built in. Code Injection attacks are one of the most commonly encountered sec...
Krutartha Patel, Sridevan Parameswaran, Seng Lin S...
Software reliability is defined as the probability of failure-free software operation for a specified period of time in a specified environment. Over the past 30 years, many softw...
Chin-Yu Huang, Chu-Ti Lin, Sy-Yen Kuo, Michael R. ...
We present a methodology for the simulation of soft errors targeting future nano-technological devices. This approach efficiently scales the failure rate of individual devices ac...
Christian J. Hescott, Drew C. Ness, David J. Lilja
A fast fault-tolerant controller structure is presented, which is capable of recovering from transient faults by performing a rollback operation in hardware. The proposed fault-to...
Andre Hertwig, Sybille Hellebrand, Hans-Joachim Wu...