We present in this paper the recent developments done in P2P-MPI, a grid middleware, concerning the fault management, which covers fault-tolerance for applications and fault detect...
— Achieving reliability in fault tolerant systems requires both avoidance and redundancy. This study focuses on avoidance as it pertains to the design of microchips. The lifecycl...
This paper presents the design and implementation of the Distributed Autonomous Replication Management (DARM) framework built on top of the Spread group communication system. The ...