Sciweavers

2400 search results - page 116 / 480
» Systems Failures
Sort
View
DSN
2005
IEEE
16 years 20 hour ago
Probabilistic QoS Guarantees for Supercomputing Systems
Supercomputing systems must be able to reliably and efficiently complete their assigned workloads, even in the presence of failures. This paper proposes a system that allows the ...
Adam J. Oliner, Larry Rudolph, Ramendra K. Sahoo, ...
PPOPP
2005
ACM
15 years 12 months ago
Fault tolerant high performance computing by a coding approach
As the number of processors in today’s high performance computers continues to grow, the mean-time-to-failure of these computers are becoming significantly shorter than the exe...
Zizhong Chen, Graham E. Fagg, Edgar Gabriel, Julie...
ICDCS
2002
IEEE
15 years 11 months ago
A Practical Approach for ?Zero? Downtime in an Operational Information System
An Operational Information System (OIS) supports a real-time view of an organization’s information critical to its logistical business operations. A central component of an OIS ...
Ada Gavrilovska, Karsten Schwan, Van Oleson
ICPP
2002
IEEE
15 years 11 months ago
An Efficient Fault-Tolerant Scheduling Algorithm for Real-Time Tasks with Precedence Constraints in Heterogeneous Systems
In this paper, we investigate an efficient off-line scheduling algorithm in which real-time tasks with precedence constraints are executed in a heterogeneous environment. It provi...
Xiao Qin, Hong Jiang, David R. Swanson
CLUSTER
2001
IEEE
15 years 10 months ago
GulfStream - a System for Dynamic Topology Management in Multi-domain Server Farms
This paper describes GulfStream, a scalable distributed software system designed to address the problem of managing the network topology in a multi-domain server farm. In particul...
Sameh A. Fakhouri, Germán S. Goldszmidt, Mi...