High-end enterprise storage has traditionally consisted of monolithic systems with customized hardware, multiple redundant components and paths, and no single point of failure. Di...
Abstract. Current solutions for fault-tolerance in HPC systems focus on dealing with the result of a failure. However, most are unable to handle runtime system configuration change...
In this paper we present a robust software infrastructure for metacomputing. The system is intended to be used by others as a building block for large and powerful computational g...
: Virtual-build-to-order (VBTO) is a form of order fulfilment system in which the producer has the ability to search across the entire pipeline of finished stock, products in produ...
We study the problem of achieving reliable communication with quiescent algorithms (i.e., algorithms that eventually stop sending messages) in asynchronous systems with process cra...