Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...
ct There have been many previous attempts to accelerate MT19937 using FPGAs but we believe that we can substantially improve the previous implementations to develop a higher throug...
To improve the whole dependability of large-scale cluster systems, an online fault detection mechanism is proposed in this paper. This mechanism can detect the fault in time befor...
In this paper, we consider the problem of scheduling independent identical tasks on heterogeneous processors where communication times and processing times are different. We assum...
If parallelism can be successfully exploited in a program, significant reductions in execution time can be achieved. However, if sections of the code are dominated by parallel ove...