The facility design problem is a common one in manufacturing and service industries and has been studied extensively in the literature. However, restrictions on the scope of the de...
Bryan A. Norman, Alice E. Smith, Rifat Aykut Arapo...
: We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithmic Based Fault Tolerance techniq...
George Bosilca, Remi Delmas, Jack Dongarra, Julien...
We develop a widely applicable algorithm to solve the fault diagnosis problem in certain distributed-memory multiprocessor systems in which there are a limited number of faulty pr...
Parallel programming is facilitated by constructs which, unlike the widely used SPMD paradigm, provide programmers with a global view of the code and data structures. These constr...
Jia Guo, Ganesh Bikshandi, Daniel Hoeflinger, Gheo...
We present an adaptive fault-tolerant wormhole routing algorithm for 2D meshes. The main feature is that with the algorithm, a normal routing message, when blocked by some faulty ...