Sciweavers

2400 search results - page 73 / 480
» Systems Failures
Sort
View
PODC
2005
ACM
15 years 11 months ago
Building scalable and robust peer-to-peer overlay networks for broadcasting using network coding
We propose a scheme for building peer-to-peer overlay networks for broadcasting using network coding. The scheme addresses many practical issues such as scalability, robustness, c...
Kamal Jain, László Lovász, Ph...
EUROPAR
2005
Springer
15 years 11 months ago
Faults in Large Distributed Systems and What We Can Do About Them
Scientists are increasingly using large distributed systems built from commodity off-the-shelf components to perform scientific computation. Grid computing has expanded the scale ...
George Kola, Tevfik Kosar, Miron Livny
CAI
2010
Springer
15 years 3 months ago
Achieving Cost-Effective Software Reliability Through Self-Healing
Heterogeneity, mobility, complexity and new application domains raise new software reliability issues that cannot be met cost-effectively only with classic software engineering ap...
Alessandra Gorla, Mauro Pezzè, Jochen Wuttk...
IPPS
2005
IEEE
15 years 11 months ago
Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems
Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. Periodic application checkpointing is a commo...
Adam J. Oliner, Ramendra K. Sahoo, José E. ...
NSDI
2007
15 years 8 months ago
Beyond One-Third Faulty Replicas in Byzantine Fault Tolerant Systems
Byzantine fault tolerant systems behave correctly when no more than f out of 3f + 1 replicas fail. When there are more than f failures, traditional BFT protocols make no guarantee...
Jinyuan Li, David Mazières