Both peer-to-peer and sensor networks have the fundamental characteristics of node churn and failures. Peers in P2P networks are highly dynamic, whereas sensors are not dependable...
Record and Replay (RR) is a software based state replication solution designed to support recording and subsequent replay of the execution of unmodified applications running on mu...
Philippe Bergheaud, Dinesh Subhraveti, Marc Vertes
There has recently been increasing interests in using system virtualization to improve the dependability of HPC cluster systems. However, it is not cost-free and may come with som...
Haibo Chen, Rong Chen, Fengzhe Zhang, Binyu Zang, ...
Technology trends are such that single event effects (SEE)are likely to become even more of a concern for the future. Decreasing feature sizes, lower operating voltage, and higher...
Fast hardware turnover in supercomputing centers, stimulated by rapid technological progress, results in high heterogeneity among HPC platforms, and necessitates that applications...