Fault tolerance in parallel systems has traditionally been achieved through a combination of redundancy and checkpointing methods. This notion has also been extended to message-pas...
Rajanikanth Batchu, Yoginder S. Dandass, Anthony S...
While it is possible to accurately predict the execution time of a given iteration of an adaptive application, it is not generally possible to predict the data-dependent adaptive ...
In considering new security paradigms, it is often worthwhile to anticipate the direction and nature of future attack paradigms. We identify a class of attacks based on the idea o...
Michael E. Locasto, Angelos Stavrou, Angelos D. Ke...
We have created ZapC, a novel system for transparent coordinated checkpoint-restart of distributed network applications on commodity clusters. ZapC provides a thin virtualization ...
This paper presents a high-level approach for assessing the performance behavior of complex scientific applications running on a high-performance system through simulation. The pr...
Thomas Fahringer, Nicola Mazzocca, Massimiliano Ra...