As the scale is expanding, node failure becomes a commonplace feature of large-scale cluster systems. As an important part of cluster operating system software, job scheduling tak...
Linping Wu, Dan Meng, Jianfeng Zhan, Wang Lei, Bib...
To improve the whole dependability of large-scale cluster systems, an online fault detection mechanism is proposed in this paper. This mechanism can detect the fault in time befor...
Abstract. Clustering is a widely used approach to ease implementation of various problems such as routing and resource management in mobile ad hoc networks (MANET)s. We propose a n...
Ad hoc sensor networks consist of large number of wireless sensors that communicate with each other in the absence of a xed infrastructure. Fast self-reconguration and power eci...
This paper presents a new approach towards parallel I/O for message-passing (MPI) applications on clusters built with commodity hardware and an SCI interconnect: instead of using t...