Sciweavers

3181 search results - page 316 / 637
» Automated Deployment Support for Parallel Distributed Comput...
Sort
View
ICDCS
2011
IEEE
14 years 6 months ago
Provisioning a Multi-tiered Data Staging Area for Extreme-Scale Machines
—Massively parallel scientific applications, running on extreme-scale supercomputers, produce hundreds of terabytes of data per run, driving the need for storage solutions to im...
Ramya Prabhakar, Sudharshan S. Vazhkudai, Youngjae...
PPOPP
2005
ACM
16 years 6 days ago
Fault tolerant high performance computing by a coding approach
As the number of processors in today’s high performance computers continues to grow, the mean-time-to-failure of these computers are becoming significantly shorter than the exe...
Zizhong Chen, Graham E. Fagg, Edgar Gabriel, Julie...
GPC
2007
Springer
15 years 10 months ago
A Design of Cooperation Management System to Improve Reliability in Resource Sharing Computing Environment
Resource sharing computing is a project that realizes high performance computing by utilizing the resources of peers that are connected to the Internet. Resource sharing computing ...
Ji Su Park, Kwang-Sik Chung, Jin Gon Shon
195
Voted
ICDCS
2002
IEEE
15 years 11 months ago
The Complexity of Adding Failsafe Fault-Tolerance
In this paper, we focus our attention on the problem of automating the addition of failsafe fault-tolerance where fault-tolerance is added to an existing (fault-intolerant) progra...
Sandeep S. Kulkarni, Ali Ebnenasir
DEBS
2007
ACM
15 years 10 months ago
A QoS policy configuration modeling language for publish/subscribe middleware platforms
Publish/subscribe (pub/sub) middleware platforms for eventbased distributed systems often provide many configurable policies that affect end-to-end quality of service (QoS). Altho...
Joe Hoffert, Douglas C. Schmidt, Aniruddha S. Gokh...