Sciweavers

1268 search results - page 207 / 254
» Verifying distributed systems: the operational approach
Sort
View
SRDS
2007
IEEE
16 years 11 days ago
Customizable Fault Tolerance for Wide-Area Replication
Constructing logical machines out of collections of physical machines is a well-known technique for improving the robustness and fault tolerance of distributed systems. We present...
Yair Amir, Brian A. Coan, Jonathan Kirsch, John La...
SOSP
2005
ACM
16 years 3 months ago
BAR fault tolerance for cooperative services
This paper describes a general approach to constructing cooperative services that span multiple administrative domains. In such environments, protocols must tolerate both Byzantin...
Amitanand S. Aiyer, Lorenzo Alvisi, Allen Clement,...
FAST
2007
15 years 7 months ago
Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You?
Component failure in large-scale IT installations is becoming an ever larger problem as the number of components in a single cluster approaches a million. In this paper, we presen...
Bianca Schroeder, Garth A. Gibson
USENIX
1996
15 years 7 months ago
Transparent Fault Tolerance for Parallel Applications on Networks of Workstations
This paper describes a new method for providingtransparent fault tolerance for parallel applications on a network of workstations. We have designed our method in the context of sh...
Daniel J. Scales, Monica S. Lam
CCGRID
2010
IEEE
15 years 7 months ago
A High-Level Interpreted MPI Library for Parallel Computing in Volunteer Environments
Idle desktops have been successfully used to run sequential and master-slave task parallel codes on a large scale in the context of volunteer computing. However, execution of messa...
Troy P. LeBlanc, Jaspal Subhlok, Edgar Gabriel