Sciweavers

5005 search results - page 631 / 1001
» The Design and Analysis of Parallel Algorithms
Sort
View
ICPP
2008
IEEE
16 years 1 months ago
On the Reliability of Large-Scale Distributed Systems A Topological View
In large-scale, self-organized and distributed systems, such as peer-to-peer (P2P) overlays and wireless sensor networks (WSN), a small proportion of nodes are likely to be more c...
Yuan He, Hao Ren, Yunhao Liu, Baijian Yang
HPDC
2007
IEEE
16 years 1 months ago
Ridge: combining reliability and performance in open grid platforms
Large-scale donation-based distributed infrastructures need to cope with the inherent unreliability of participant nodes. A widely-used work scheduling technique in such environme...
Krishnaveni Budati, Jason D. Sonnek, Abhishek Chan...
ICPADS
2007
IEEE
16 years 1 months ago
Federated clusters using the transparent remote Execution (TREx) environment
- Due to the increasing complexity of scientific models, large-scale simulation tools often require a critical amount of computational power to produce results in a reasonable amou...
Richert Wang, Enrique Cauich, Isaac D. Scherson
ICPP
2007
IEEE
16 years 1 months ago
Achieving Reliability through Replication in a Wide-Area Network DHT Storage System
It is a challenge to design and implement a wide-area distributed hash table (DHT) which provides a storage service with high reliability. Many existing systems use replication to...
Jing Zhao, Hongliang Yu, Kun Zhang, Weimin Zheng, ...
ICPP
2007
IEEE
16 years 1 months ago
Fault-Driven Re-Scheduling For Improving System-level Fault Resilience
The productivity of HPC system is determined not only by their performance, but also by their reliability. The conventional method to limit the impact of failures is checkpointing...
Yawei Li, Prashasta Gujrati, Zhiling Lan, Xian-He ...