Sciweavers

204 search results - page 5 / 41
» Fault-tolerant solutions for a MPI compute intensive applica...
Sort
View
IEEESCC
2008
IEEE
16 years 13 days ago
A Fault Tolerance Approach for Enterprise Applications
Service Oriented Architectures (SOAs) have emerged as a preferred solution to tackle the complexity of large-scale, complex, distributed, and heterogeneous systems. Key to success...
Vina Ermagan, Ingolf Krüger, Massimiliano Men...
ICPP
2009
IEEE
16 years 20 days ago
CIFTS: A Coordinated Infrastructure for Fault-Tolerant Systems
—Considerable work has been done on providing fault tolerance capabilities for different software components on largescale high-end computing systems. Thus far, however, these fa...
Rinku Gupta, Pete Beckman, Byung-Hoon Park, Ewing ...
IPPS
2007
IEEE
16 years 9 days ago
A Fault Tolerance Protocol with Fast Fault Recovery
Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...
Sayantan Chakravorty, Laxmikant V. Kalé
DSN
2002
IEEE
15 years 11 months ago
Generic Timing Fault Tolerance using a Timely Computing Base
Designing applications with timeliness requirements in environments of uncertain synchrony is known to be a difficult problem. In this paper, we follow the perspective of timing ...
Antonio Casimiro, Paulo Veríssimo
ICDCS
2007
IEEE
16 years 10 days ago
Fault Tolerance in Multiprocessor Systems Via Application Cloning
Record and Replay (RR) is a software based state replication solution designed to support recording and subsequent replay of the execution of unmodified applications running on mu...
Philippe Bergheaud, Dinesh Subhraveti, Marc Vertes