Different schemes for large scale networks hosting distributed applications have been recently adopted for network path marking based on adaptive behavior of swarm-based agents. T...
The Rocks toolkit [9], [7], [10] uses a graph-based framework to describe the configuration of all node types (termed appliances) that make up a complete cluster. With hundreds of...
Greg Bruno, Mason J. Katz, Federico D. Sacerdoti, ...
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
As high performance clusters continue to grow in size, the mean time between failure shrinks. Thus, the issues of fault tolerance and reliability are becoming one of the challengi...
This paper presents three contributions to research on middleware load balancing. First, it describes the design of Cygnus, which is an extensible open-source middleware framework...
Jaiganesh Balasubramanian, Douglas C. Schmidt, Law...