Sciweavers

1113 search results - page 15 / 223
» Performance under Failures of DAG-based Parallel Computing
Sort
View
151
Voted
IPPS
1999
IEEE
15 years 10 months ago
The Performance of Coordinated and Independent Checkpointing
Checkpointing is a very effective technique to tolerate the occurrence of failures in distributed and parallel applications. The existing algorithms in the literature are basicall...
Luís Moura Silva, João Gabriel Silva
168
Voted
ICDCS
1995
IEEE
15 years 9 months ago
Parallel Processing on Networks of Workstations: A Fault-Tolerant, High Performance Approach
One of the mostsoughtaftersoftware innovation of thisdecade is the construction of systems using off-the-shelf workstations that actually deliver, and even surpass, the power and ...
Partha Dasgupta, Zvi M. Kedem, Michael O. Rabin
147
Voted
CCGRID
2008
IEEE
16 years 9 days ago
Application Resilience: Making Progress in Spite of Failure
Abstract—While measures such as raw compute performance and system capacity continue to be important factors for evaluating cluster performance, such issues as system reliability...
William M. Jones, John T. Daly, Nathan DeBardelebe...
138
Voted
PODC
2005
ACM
15 years 11 months ago
On reliable broadcast in a radio network
— We consider the problem of reliable broadcast in an infinite grid (or finite toroidal) radio network under Byzantine and crash-stop failures. We present bounds on the maximum...
Vartika Bhandari, Nitin H. Vaidya
144
Voted
IPPS
2007
IEEE
16 years 3 days ago
Models and Heuristics for Robust Resource Allocation in Parallel and Distributed Computing Systems
This is an overview of the robust resource allocation research efforts that have been and continue to be conducted by the CSU Robustness in Computer Systems Group. Parallel and di...
David L. Janovy, Jay Smith, Howard Jay Siegel, Ant...