— Checkpointing is an indispensable technique to provide fault tolerance for long-running high-throughput applications like those running on desktop grids. This paper argues that...
Samer Al-Kiswany, Matei Ripeanu, Sudharshan S. Vaz...
Abstract. Consensus is a fundamental building block used to solve many practical problems that appear on reliable distributed systems. In spite of the fact that consensus is being ...
In program debugging, finding a failing run is only the first step; what about correcting the fault? Can we automate the second task as well as the first? The AutoFix-E tool au...
Yi Wei, Yu Pei, Carlo A. Furia, Lucas S. Silva, St...
Multicast conferencing is a rapidly-growing area of Internet use. Audio, video and other media such as shared whiteboard data can be distributed efficiently between groups of confe...
Mourad Amad, Zahir Haddad, Lachemi Khenous, Kamal ...
In this work we propose an online reliability tracking framework that utilizes a hybrid network of on-chip temperature and delay sensors together with a circuit reliability macrom...