This paper explores the challenges associated with distributed application management in large-scale computing environments. In particular, we investigate several techniques for e...
Nikolay Topilski, Jeannie R. Albrecht, Amin Vahdat
In this paper the problem of fault-tolerant message routing in two-dimensional meshes, with each inner node having 4 neighbors, is investigated. It is assumed that some nodes/links...
This paper introduces the Sigma algorithm that solves fault-tolerant mutual exclusion problem in dynamic systems where the set of processes may be large and change dynamically, pr...
The architecture and implementation of the LEON-FT processor is presented. LEON-FT is a fault-tolerant 32-bit processor based on the SPARC V8 instruction set. The processors toler...
This paper addresses the issue of fault-tolerance in applications that make use of network storage. A network abstraction called the Network Storage Stack is presented, along with...
Scott Atchley, Stephen Soltesz, James S. Plank, Mi...